Инициализация проекта

2025-05-14 01:02:21 +03:00 · 2025-05-14 01:02:21 +03:00 · deac764f3f
commit deac764f3f
6 changed files with 847 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,2 @@
+.venv
+A_Z_HandwrittenLetters.csv
--- a/.ipynb_checkpoints/plot_digits_classification-checkpoint.ipynb
+++ b/.ipynb_checkpoints/plot_digits_classification-checkpoint.ipynb
@ -0,0 +1,151 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n# Recognizing hand-written digits\n\nThis example shows how scikit-learn can be used to recognize images of\nhand-written digits, from 0-9.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause\n\n# Standard scientific Python imports\nimport matplotlib.pyplot as plt\n\n# Import datasets, classifiers and performance metrics\nfrom sklearn import datasets, metrics, svm\nfrom sklearn.model_selection import train_test_split"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Digits dataset\n\nThe digits dataset consists of 8x8\npixel images of digits. The ``images`` attribute of the dataset stores\n8x8 arrays of grayscale values for each image. We will use these arrays to\nvisualize the first 4 images. The ``target`` attribute of the dataset stores\nthe digit each image represents and this is included in the title of the 4\nplots below.\n\nNote: if we were working from image files (e.g., 'png' files), we would load\nthem using :func:`matplotlib.pyplot.imread`.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "digits = datasets.load_digits()\n\n_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))\nfor ax, image, label in zip(axes, digits.images, digits.target):\n    ax.set_axis_off()\n    ax.imshow(image, cmap=plt.cm.gray_r, interpolation=\"nearest\")\n    ax.set_title(\"Training: %i\" % label)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Classification\n\nTo apply a classifier on this data, we need to flatten the images, turning\neach 2-D array of grayscale values from shape ``(8, 8)`` into shape\n``(64,)``. Subsequently, the entire dataset will be of shape\n``(n_samples, n_features)``, where ``n_samples`` is the number of images and\n``n_features`` is the total number of pixels in each image.\n\nWe can then split the data into train and test subsets and fit a support\nvector classifier on the train samples. The fitted classifier can\nsubsequently be used to predict the value of the digit for the samples\nin the test subset.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "# flatten the images\nn_samples = len(digits.images)\ndata = digits.images.reshape((n_samples, -1))\n\n# Create a classifier: a support vector classifier\nclf = svm.SVC(gamma=0.001)\n\n# Split data into 50% train and 50% test subsets\nX_train, X_test, y_train, y_test = train_test_split(\n    data, digits.target, test_size=0.5, shuffle=False\n)\n\n# Learn the digits on the train subset\nclf.fit(X_train, y_train)\n\n# Predict the value of the digit on the test subset\npredicted = clf.predict(X_test)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Below we visualize the first 4 test samples and show their predicted\ndigit value in the title.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))\nfor ax, image, prediction in zip(axes, X_test, predicted):\n    ax.set_axis_off()\n    image = image.reshape(8, 8)\n    ax.imshow(image, cmap=plt.cm.gray_r, interpolation=\"nearest\")\n    ax.set_title(f\"Prediction: {prediction}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        ":func:`~sklearn.metrics.classification_report` builds a text report showing\nthe main classification metrics.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "print(\n    f\"Classification report for classifier {clf}:\\n\"\n    f\"{metrics.classification_report(y_test, predicted)}\\n\"\n)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We can also plot a `confusion matrix <confusion_matrix>` of the\ntrue digit values and the predicted digit values.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "disp = metrics.ConfusionMatrixDisplay.from_predictions(y_test, predicted)\ndisp.figure_.suptitle(\"Confusion Matrix\")\nprint(f\"Confusion matrix:\\n{disp.confusion_matrix}\")\n\nplt.show()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "If the results from evaluating a classifier are stored in the form of a\n`confusion matrix <confusion_matrix>` and not in terms of `y_true` and\n`y_pred`, one can still build a :func:`~sklearn.metrics.classification_report`\nas follows:\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "# The ground truth and predicted lists\ny_true = []\ny_pred = []\ncm = disp.confusion_matrix\n\n# For each cell in the confusion matrix, add the corresponding ground truths\n# and predictions to the lists\nfor gt in range(len(cm)):\n    for pred in range(len(cm)):\n        y_true += [gt] * cm[gt][pred]\n        y_pred += [pred] * cm[gt][pred]\n\nprint(\n    \"Classification report rebuilt from confusion matrix:\\n\"\n    f\"{metrics.classification_report(y_true, y_pred)}\\n\"\n)"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.9.21"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
--- a/app.py
+++ b/app.py
@ -0,0 +1,84 @@
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+from sklearn.svm import SVC
+from sklearn.metrics import classification_report, ConfusionMatrixDisplay
+
+# -----------------------------
+# 1. Загрузка данных
+# -----------------------------
+# Попробуем определить, есть ли заголовки
+try:
+    df = pd.read_csv('A_Z_HandwrittenLetters.csv')
+    if 'label' in df.columns:
+        print("Заголовки найдены, используется столбец 'label'")
+    else:
+        raise Exception("Столбец 'label' не найден")
+except Exception as e:
+    print(f"Ошибка: {e}, попытка чтения без заголовков")
+    df = pd.read_csv('A_Z_HandwrittenLetters.csv', header=None)
+
+# Разделение на признаки и метки
+if 'label' in df.columns:
+    y = df['label'].values
+    X = df.drop('label', axis=1).values
+else:
+    y = df[0].values
+    X = df.drop(0, axis=1).values
+
+# -----------------------------
+# 2. Нормализация
+# -----------------------------
+X = X / 255.0
+
+# -----------------------------
+# 3. Визуализация первых изображений
+# -----------------------------
+_, axes = plt.subplots(1, 4, figsize=(10, 5))
+for ax, image, label in zip(axes, X[:4], y[:4]):
+    ax.imshow(image.reshape(28, 28), cmap='gray')
+    ax.axis('off')
+    ax.set_title(f"Label: {label}\n({chr(label + ord('A'))})")
+plt.suptitle("Sample Training Images")
+plt.show()
+
+# -----------------------------
+# 4. Разделение выборки
+# -----------------------------
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y, test_size=0.3, random_state=42, shuffle=True
+)
+
+# -----------------------------
+# 5. Обучение модели
+# -----------------------------
+print("Обучение модели...")
+clf = SVC(gamma=0.001)
+clf.fit(X_train, y_train)
+
+# -----------------------------
+# 6. Предсказание
+# -----------------------------
+y_pred = clf.predict(X_test)
+
+# -----------------------------
+# 7. Визуализация предсказаний
+# -----------------------------
+_, axes = plt.subplots(1, 4, figsize=(10, 5))
+for ax, image, prediction in zip(axes, X_test, y_pred):
+    ax.imshow(image.reshape(28, 28), cmap='gray')
+    ax.axis('off')
+    ax.set_title(f"Prediction: {prediction}\n({chr(prediction + ord('A'))})")
+plt.suptitle("Predicted Letters")
+plt.show()
+
+# -----------------------------
+# 8. Отчеты и матрица ошибок
+# -----------------------------
+print("\nClassification Report:")
+print(classification_report(y_test, y_pred))
+
+disp = ConfusionMatrixDisplay.from_predictions(y_test, y_pred)
+disp.figure_.suptitle("Confusion Matrix")
+plt.show()
--- a/plot_digits_classification.ipynb
+++ b/plot_digits_classification.ipynb
--- a/plot_digits_classification.py
+++ b/plot_digits_classification.py
@ -0,0 +1,128 @@
+"""
+================================
+Recognizing hand-written digits
+================================
+
+This example shows how scikit-learn can be used to recognize images of
+hand-written digits, from 0-9.
+
+"""
+
+# Authors: The scikit-learn developers
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Standard scientific Python imports
+import matplotlib.pyplot as plt
+
+# Import datasets, classifiers and performance metrics
+from sklearn import datasets, metrics, svm
+from sklearn.model_selection import train_test_split
+
+###############################################################################
+# Digits dataset
+# --------------
+#
+# The digits dataset consists of 8x8
+# pixel images of digits. The ``images`` attribute of the dataset stores
+# 8x8 arrays of grayscale values for each image. We will use these arrays to
+# visualize the first 4 images. The ``target`` attribute of the dataset stores
+# the digit each image represents and this is included in the title of the 4
+# plots below.
+#
+# Note: if we were working from image files (e.g., 'png' files), we would load
+# them using :func:`matplotlib.pyplot.imread`.
+
+digits = datasets.load_digits()
+
+_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
+for ax, image, label in zip(axes, digits.images, digits.target):
+    ax.set_axis_off()
+    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
+    ax.set_title("Training: %i" % label)
+
+###############################################################################
+# Classification
+# --------------
+#
+# To apply a classifier on this data, we need to flatten the images, turning
+# each 2-D array of grayscale values from shape ``(8, 8)`` into shape
+# ``(64,)``. Subsequently, the entire dataset will be of shape
+# ``(n_samples, n_features)``, where ``n_samples`` is the number of images and
+# ``n_features`` is the total number of pixels in each image.
+#
+# We can then split the data into train and test subsets and fit a support
+# vector classifier on the train samples. The fitted classifier can
+# subsequently be used to predict the value of the digit for the samples
+# in the test subset.
+
+# flatten the images
+n_samples = len(digits.images)
+data = digits.images.reshape((n_samples, -1))
+
+# Create a classifier: a support vector classifier
+clf = svm.SVC(gamma=0.001)
+
+# Split data into 50% train and 50% test subsets
+X_train, X_test, y_train, y_test = train_test_split(
+    data, digits.target, test_size=0.5, shuffle=False
+)
+
+# Learn the digits on the train subset
+clf.fit(X_train, y_train)
+
+# Predict the value of the digit on the test subset
+predicted = clf.predict(X_test)
+
+###############################################################################
+# Below we visualize the first 4 test samples and show their predicted
+# digit value in the title.
+
+_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
+for ax, image, prediction in zip(axes, X_test, predicted):
+    ax.set_axis_off()
+    image = image.reshape(8, 8)
+    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
+    ax.set_title(f"Prediction: {prediction}")
+
+###############################################################################
+# :func:`~sklearn.metrics.classification_report` builds a text report showing
+# the main classification metrics.
+
+print(
+    f"Classification report for classifier {clf}:\n"
+    f"{metrics.classification_report(y_test, predicted)}\n"
+)
+
+###############################################################################
+# We can also plot a :ref:`confusion matrix <confusion_matrix>` of the
+# true digit values and the predicted digit values.
+
+disp = metrics.ConfusionMatrixDisplay.from_predictions(y_test, predicted)
+disp.figure_.suptitle("Confusion Matrix")
+print(f"Confusion matrix:\n{disp.confusion_matrix}")
+
+plt.show()
+
+###############################################################################
+# If the results from evaluating a classifier are stored in the form of a
+# :ref:`confusion matrix <confusion_matrix>` and not in terms of `y_true` and
+# `y_pred`, one can still build a :func:`~sklearn.metrics.classification_report`
+# as follows:
+
+
+# The ground truth and predicted lists
+y_true = []
+y_pred = []
+cm = disp.confusion_matrix
+
+# For each cell in the confusion matrix, add the corresponding ground truths
+# and predictions to the lists
+for gt in range(len(cm)):
+    for pred in range(len(cm)):
+        y_true += [gt] * cm[gt][pred]
+        y_pred += [pred] * cm[gt][pred]
+
+print(
+    "Classification report rebuilt from confusion matrix:\n"
+    f"{metrics.classification_report(y_true, y_pred)}\n"
+)
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,114 @@
+anyio==4.9.0
+argon2-cffi==23.1.0
+argon2-cffi-bindings==21.2.0
+arrow==1.3.0
+asttokens==3.0.0
+async-lru==2.0.5
+attrs==25.3.0
+babel==2.17.0
+beautifulsoup4==4.13.4
+bleach==6.2.0
+certifi==2025.4.26
+cffi==1.17.1
+charset-normalizer==3.4.2
+colorama==0.4.6
+comm==0.2.2
+contourpy==1.3.2
+cycler==0.12.1
+debugpy==1.8.14
+decorator==5.2.1
+defusedxml==0.7.1
+executing==2.2.0
+fastjsonschema==2.21.1
+fonttools==4.58.0
+fqdn==1.5.1
+h11==0.16.0
+httpcore==1.0.9
+httpx==0.28.1
+idna==3.10
+ipykernel==6.29.5
+ipython==9.2.0
+ipython_pygments_lexers==1.1.1
+isoduration==20.11.0
+jedi==0.19.2
+Jinja2==3.1.6
+joblib==1.5.0
+json5==0.12.0
+jsonpointer==3.0.0
+jsonschema==4.23.0
+jsonschema-specifications==2025.4.1
+jupyter-events==0.12.0
+jupyter-lsp==2.2.5
+jupyter_client==8.6.3
+jupyter_core==5.7.2
+jupyter_server==2.16.0
+jupyter_server_terminals==0.5.3
+jupyterlab==4.4.2
+jupyterlab_pygments==0.3.0
+jupyterlab_server==2.27.3
+kiwisolver==1.4.8
+liac-arff==2.5.0
+MarkupSafe==3.0.2
+matplotlib==3.10.3
+matplotlib-inline==0.1.7
+minio==7.2.15
+mistune==3.1.3
+nbclient==0.10.2
+nbconvert==7.16.6
+nbformat==5.10.4
+nest-asyncio==1.6.0
+notebook_shim==0.2.4
+numpy==2.2.5
+openml==0.15.1
+overrides==7.7.0
+packaging==25.0
+pandas==2.2.3
+pandocfilters==1.5.1
+parso==0.8.4
+pillow==11.2.1
+platformdirs==4.3.8
+prometheus_client==0.21.1
+prompt_toolkit==3.0.51
+psutil==7.0.0
+pure_eval==0.2.3
+pyarrow==20.0.0
+pycparser==2.22
+pycryptodome==3.22.0
+Pygments==2.19.1
+pyparsing==3.2.3
+python-dateutil==2.9.0.post0
+python-json-logger==3.3.0
+pytz==2025.2
+pywin32==310
+pywinpty==2.0.15
+PyYAML==6.0.2
+pyzmq==26.4.0
+referencing==0.36.2
+requests==2.32.3
+rfc3339-validator==0.1.4
+rfc3986-validator==0.1.1
+rpds-py==0.24.0
+scikit-learn==1.6.1
+scipy==1.15.3
+Send2Trash==1.8.3
+setuptools==80.4.0
+six==1.17.0
+sniffio==1.3.1
+soupsieve==2.7
+stack-data==0.6.3
+terminado==0.18.1
+threadpoolctl==3.6.0
+tinycss2==1.4.0
+tornado==6.4.2
+tqdm==4.67.1
+traitlets==5.14.3
+types-python-dateutil==2.9.0.20241206
+typing_extensions==4.13.2
+tzdata==2025.2
+uri-template==1.3.0
+urllib3==2.4.0
+wcwidth==0.2.13
+webcolors==24.11.1
+webencodings==0.5.1
+websocket-client==1.8.0
+xmltodict==0.14.2