{ "cells": [ { "cell_type": "markdown", "id": "ebfd86d8-637c-4d39-bda3-fc144ba8f184", "metadata": {}, "source": [ "# Решающие деревья" ] }, { "cell_type": "markdown", "id": "ffd87cac-d792-4b3a-9f75-adea1ba08f52", "metadata": {}, "source": [ "### 1. Загрузка выборки из файла titanic.csv" ] }, { "cell_type": "code", "execution_count": 17, "id": "33f4c4d8-11aa-4647-8e94-8c211a18942c", "metadata": {}, "outputs": [], "source": [ "from sklearn.tree import DecisionTreeClassifier\n", "import pandas as pd\n", "\n", "# Шаг 1: Загрузим данные\n", "data = pd.read_csv('titanic.csv')" ] }, { "cell_type": "markdown", "id": "93cbfe78-75ae-4815-b5a1-93f06bb366e9", "metadata": {}, "source": [ "### 2. Оставим нужные признаки: Pclass, Fare, Age и Sex" ] }, { "cell_type": "code", "execution_count": 18, "id": "c3a99cf2-7864-4e59-a5f3-c4a1c80fe468", "metadata": {}, "outputs": [], "source": [ "features = data[['Pclass', 'Fare', 'Age', 'Sex', 'Survived']].copy()" ] }, { "cell_type": "markdown", "id": "708757cc-244d-435f-8f92-d126baef1c6f", "metadata": {}, "source": [ "### 3. Преобразуем пол в числовой формат (male -> 0, female -> 1)" ] }, { "cell_type": "code", "execution_count": 19, "id": "f22dcc08-d6db-4687-b189-93b359a9c3f9", "metadata": {}, "outputs": [], "source": [ "features['Sex'] = features['Sex'].map({'male': 0, 'female': 1})" ] }, { "cell_type": "markdown", "id": "55500700-5fef-4d82-ab87-442fe23184c9", "metadata": {}, "source": [ "### 5. Удалим строки с пропусками в признаках" ] }, { "cell_type": "code", "execution_count": 20, "id": "520e50cd-9a9c-4de5-b7be-592af3b4cb97", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PclassFareAgeSexSurvived
037.250022.000
1171.283338.011
237.925026.011
3153.100035.011
438.050035.000
\n", "
" ], "text/plain": [ " Pclass Fare Age Sex Survived\n", "0 3 7.2500 22.0 0 0\n", "1 1 71.2833 38.0 1 1\n", "2 3 7.9250 26.0 1 1\n", "3 1 53.1000 35.0 1 1\n", "4 3 8.0500 35.0 0 0" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features_cleaned = features.dropna()\n", "features_cleaned.head()" ] }, { "cell_type": "markdown", "id": "aa1f7fd6-705d-4750-988e-2199c31bca27", "metadata": {}, "source": [ "### 4. Разделяем признаки и целевую переменную" ] }, { "cell_type": "code", "execution_count": 22, "id": "49d99a2e-438f-4998-9083-241ced726673", "metadata": {}, "outputs": [], "source": [ "X = features_cleaned[['Pclass', 'Fare', 'Age', 'Sex']]\n", "y = features_cleaned['Survived']" ] }, { "cell_type": "markdown", "id": "3005b40d-3ef5-4be2-96f1-8e5eff8b23a7", "metadata": {}, "source": [ "### 6. Создаем и обучаем модель решающего дерева" ] }, { "cell_type": "code", "execution_count": 23, "id": "68e04e9a-6eb0-4320-83bf-da33b68e811f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
DecisionTreeClassifier(random_state=241)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "DecisionTreeClassifier(random_state=241)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf = DecisionTreeClassifier(random_state=241)\n", "clf.fit(X, y)" ] }, { "cell_type": "markdown", "id": "a31d681f-904a-414c-a23d-9db2327fd71d", "metadata": {}, "source": [ "### 7. Вычисление важности признаков и поиск признака с наибольшей важностью" ] }, { "cell_type": "code", "execution_count": 25, "id": "51bdb322-2be8-49ee-b32b-106b8b496edb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sex 0.300512\n", "Fare 0.295385\n", "dtype: float64\n" ] } ], "source": [ "importances = clf.feature_importances_\n", "\n", "importance_series = pd.Series(importances, index=X.columns)\n", "print(importance_series.sort_values(ascending=False).head(2))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }