Инициализация проекта

This commit is contained in:
Вера Виноградова 2025-05-14 01:02:21 +03:00
commit deac764f3f
6 changed files with 847 additions and 0 deletions

2
.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
.venv
A_Z_HandwrittenLetters.csv

View File

@ -0,0 +1,151 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Recognizing hand-written digits\n\nThis example shows how scikit-learn can be used to recognize images of\nhand-written digits, from 0-9.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause\n\n# Standard scientific Python imports\nimport matplotlib.pyplot as plt\n\n# Import datasets, classifiers and performance metrics\nfrom sklearn import datasets, metrics, svm\nfrom sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Digits dataset\n\nThe digits dataset consists of 8x8\npixel images of digits. The ``images`` attribute of the dataset stores\n8x8 arrays of grayscale values for each image. We will use these arrays to\nvisualize the first 4 images. The ``target`` attribute of the dataset stores\nthe digit each image represents and this is included in the title of the 4\nplots below.\n\nNote: if we were working from image files (e.g., 'png' files), we would load\nthem using :func:`matplotlib.pyplot.imread`.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"digits = datasets.load_digits()\n\n_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))\nfor ax, image, label in zip(axes, digits.images, digits.target):\n ax.set_axis_off()\n ax.imshow(image, cmap=plt.cm.gray_r, interpolation=\"nearest\")\n ax.set_title(\"Training: %i\" % label)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Classification\n\nTo apply a classifier on this data, we need to flatten the images, turning\neach 2-D array of grayscale values from shape ``(8, 8)`` into shape\n``(64,)``. Subsequently, the entire dataset will be of shape\n``(n_samples, n_features)``, where ``n_samples`` is the number of images and\n``n_features`` is the total number of pixels in each image.\n\nWe can then split the data into train and test subsets and fit a support\nvector classifier on the train samples. The fitted classifier can\nsubsequently be used to predict the value of the digit for the samples\nin the test subset.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# flatten the images\nn_samples = len(digits.images)\ndata = digits.images.reshape((n_samples, -1))\n\n# Create a classifier: a support vector classifier\nclf = svm.SVC(gamma=0.001)\n\n# Split data into 50% train and 50% test subsets\nX_train, X_test, y_train, y_test = train_test_split(\n data, digits.target, test_size=0.5, shuffle=False\n)\n\n# Learn the digits on the train subset\nclf.fit(X_train, y_train)\n\n# Predict the value of the digit on the test subset\npredicted = clf.predict(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below we visualize the first 4 test samples and show their predicted\ndigit value in the title.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))\nfor ax, image, prediction in zip(axes, X_test, predicted):\n ax.set_axis_off()\n image = image.reshape(8, 8)\n ax.imshow(image, cmap=plt.cm.gray_r, interpolation=\"nearest\")\n ax.set_title(f\"Prediction: {prediction}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":func:`~sklearn.metrics.classification_report` builds a text report showing\nthe main classification metrics.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(\n f\"Classification report for classifier {clf}:\\n\"\n f\"{metrics.classification_report(y_test, predicted)}\\n\"\n)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also plot a `confusion matrix <confusion_matrix>` of the\ntrue digit values and the predicted digit values.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"disp = metrics.ConfusionMatrixDisplay.from_predictions(y_test, predicted)\ndisp.figure_.suptitle(\"Confusion Matrix\")\nprint(f\"Confusion matrix:\\n{disp.confusion_matrix}\")\n\nplt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the results from evaluating a classifier are stored in the form of a\n`confusion matrix <confusion_matrix>` and not in terms of `y_true` and\n`y_pred`, one can still build a :func:`~sklearn.metrics.classification_report`\nas follows:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# The ground truth and predicted lists\ny_true = []\ny_pred = []\ncm = disp.confusion_matrix\n\n# For each cell in the confusion matrix, add the corresponding ground truths\n# and predictions to the lists\nfor gt in range(len(cm)):\n for pred in range(len(cm)):\n y_true += [gt] * cm[gt][pred]\n y_pred += [pred] * cm[gt][pred]\n\nprint(\n \"Classification report rebuilt from confusion matrix:\\n\"\n f\"{metrics.classification_report(y_true, y_pred)}\\n\"\n)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.21"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

84
app.py Normal file
View File

@ -0,0 +1,84 @@
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
# -----------------------------
# 1. Загрузка данных
# -----------------------------
# Попробуем определить, есть ли заголовки
try:
df = pd.read_csv('A_Z_HandwrittenLetters.csv')
if 'label' in df.columns:
print("Заголовки найдены, используется столбец 'label'")
else:
raise Exception("Столбец 'label' не найден")
except Exception as e:
print(f"Ошибка: {e}, попытка чтения без заголовков")
df = pd.read_csv('A_Z_HandwrittenLetters.csv', header=None)
# Разделение на признаки и метки
if 'label' in df.columns:
y = df['label'].values
X = df.drop('label', axis=1).values
else:
y = df[0].values
X = df.drop(0, axis=1).values
# -----------------------------
# 2. Нормализация
# -----------------------------
X = X / 255.0
# -----------------------------
# 3. Визуализация первых изображений
# -----------------------------
_, axes = plt.subplots(1, 4, figsize=(10, 5))
for ax, image, label in zip(axes, X[:4], y[:4]):
ax.imshow(image.reshape(28, 28), cmap='gray')
ax.axis('off')
ax.set_title(f"Label: {label}\n({chr(label + ord('A'))})")
plt.suptitle("Sample Training Images")
plt.show()
# -----------------------------
# 4. Разделение выборки
# -----------------------------
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, shuffle=True
)
# -----------------------------
# 5. Обучение модели
# -----------------------------
print("Обучение модели...")
clf = SVC(gamma=0.001)
clf.fit(X_train, y_train)
# -----------------------------
# 6. Предсказание
# -----------------------------
y_pred = clf.predict(X_test)
# -----------------------------
# 7. Визуализация предсказаний
# -----------------------------
_, axes = plt.subplots(1, 4, figsize=(10, 5))
for ax, image, prediction in zip(axes, X_test, y_pred):
ax.imshow(image.reshape(28, 28), cmap='gray')
ax.axis('off')
ax.set_title(f"Prediction: {prediction}\n({chr(prediction + ord('A'))})")
plt.suptitle("Predicted Letters")
plt.show()
# -----------------------------
# 8. Отчеты и матрица ошибок
# -----------------------------
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
disp = ConfusionMatrixDisplay.from_predictions(y_test, y_pred)
disp.figure_.suptitle("Confusion Matrix")
plt.show()

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,128 @@
"""
================================
Recognizing hand-written digits
================================
This example shows how scikit-learn can be used to recognize images of
hand-written digits, from 0-9.
"""
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
# Standard scientific Python imports
import matplotlib.pyplot as plt
# Import datasets, classifiers and performance metrics
from sklearn import datasets, metrics, svm
from sklearn.model_selection import train_test_split
###############################################################################
# Digits dataset
# --------------
#
# The digits dataset consists of 8x8
# pixel images of digits. The ``images`` attribute of the dataset stores
# 8x8 arrays of grayscale values for each image. We will use these arrays to
# visualize the first 4 images. The ``target`` attribute of the dataset stores
# the digit each image represents and this is included in the title of the 4
# plots below.
#
# Note: if we were working from image files (e.g., 'png' files), we would load
# them using :func:`matplotlib.pyplot.imread`.
digits = datasets.load_digits()
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, digits.images, digits.target):
ax.set_axis_off()
ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
ax.set_title("Training: %i" % label)
###############################################################################
# Classification
# --------------
#
# To apply a classifier on this data, we need to flatten the images, turning
# each 2-D array of grayscale values from shape ``(8, 8)`` into shape
# ``(64,)``. Subsequently, the entire dataset will be of shape
# ``(n_samples, n_features)``, where ``n_samples`` is the number of images and
# ``n_features`` is the total number of pixels in each image.
#
# We can then split the data into train and test subsets and fit a support
# vector classifier on the train samples. The fitted classifier can
# subsequently be used to predict the value of the digit for the samples
# in the test subset.
# flatten the images
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
# Create a classifier: a support vector classifier
clf = svm.SVC(gamma=0.001)
# Split data into 50% train and 50% test subsets
X_train, X_test, y_train, y_test = train_test_split(
data, digits.target, test_size=0.5, shuffle=False
)
# Learn the digits on the train subset
clf.fit(X_train, y_train)
# Predict the value of the digit on the test subset
predicted = clf.predict(X_test)
###############################################################################
# Below we visualize the first 4 test samples and show their predicted
# digit value in the title.
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, X_test, predicted):
ax.set_axis_off()
image = image.reshape(8, 8)
ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
ax.set_title(f"Prediction: {prediction}")
###############################################################################
# :func:`~sklearn.metrics.classification_report` builds a text report showing
# the main classification metrics.
print(
f"Classification report for classifier {clf}:\n"
f"{metrics.classification_report(y_test, predicted)}\n"
)
###############################################################################
# We can also plot a :ref:`confusion matrix <confusion_matrix>` of the
# true digit values and the predicted digit values.
disp = metrics.ConfusionMatrixDisplay.from_predictions(y_test, predicted)
disp.figure_.suptitle("Confusion Matrix")
print(f"Confusion matrix:\n{disp.confusion_matrix}")
plt.show()
###############################################################################
# If the results from evaluating a classifier are stored in the form of a
# :ref:`confusion matrix <confusion_matrix>` and not in terms of `y_true` and
# `y_pred`, one can still build a :func:`~sklearn.metrics.classification_report`
# as follows:
# The ground truth and predicted lists
y_true = []
y_pred = []
cm = disp.confusion_matrix
# For each cell in the confusion matrix, add the corresponding ground truths
# and predictions to the lists
for gt in range(len(cm)):
for pred in range(len(cm)):
y_true += [gt] * cm[gt][pred]
y_pred += [pred] * cm[gt][pred]
print(
"Classification report rebuilt from confusion matrix:\n"
f"{metrics.classification_report(y_true, y_pred)}\n"
)

114
requirements.txt Normal file
View File

@ -0,0 +1,114 @@
anyio==4.9.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==3.0.0
async-lru==2.0.5
attrs==25.3.0
babel==2.17.0
beautifulsoup4==4.13.4
bleach==6.2.0
certifi==2025.4.26
cffi==1.17.1
charset-normalizer==3.4.2
colorama==0.4.6
comm==0.2.2
contourpy==1.3.2
cycler==0.12.1
debugpy==1.8.14
decorator==5.2.1
defusedxml==0.7.1
executing==2.2.0
fastjsonschema==2.21.1
fonttools==4.58.0
fqdn==1.5.1
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.10
ipykernel==6.29.5
ipython==9.2.0
ipython_pygments_lexers==1.1.1
isoduration==20.11.0
jedi==0.19.2
Jinja2==3.1.6
joblib==1.5.0
json5==0.12.0
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2025.4.1
jupyter-events==0.12.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.16.0
jupyter_server_terminals==0.5.3
jupyterlab==4.4.2
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
kiwisolver==1.4.8
liac-arff==2.5.0
MarkupSafe==3.0.2
matplotlib==3.10.3
matplotlib-inline==0.1.7
minio==7.2.15
mistune==3.1.3
nbclient==0.10.2
nbconvert==7.16.6
nbformat==5.10.4
nest-asyncio==1.6.0
notebook_shim==0.2.4
numpy==2.2.5
openml==0.15.1
overrides==7.7.0
packaging==25.0
pandas==2.2.3
pandocfilters==1.5.1
parso==0.8.4
pillow==11.2.1
platformdirs==4.3.8
prometheus_client==0.21.1
prompt_toolkit==3.0.51
psutil==7.0.0
pure_eval==0.2.3
pyarrow==20.0.0
pycparser==2.22
pycryptodome==3.22.0
Pygments==2.19.1
pyparsing==3.2.3
python-dateutil==2.9.0.post0
python-json-logger==3.3.0
pytz==2025.2
pywin32==310
pywinpty==2.0.15
PyYAML==6.0.2
pyzmq==26.4.0
referencing==0.36.2
requests==2.32.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.24.0
scikit-learn==1.6.1
scipy==1.15.3
Send2Trash==1.8.3
setuptools==80.4.0
six==1.17.0
sniffio==1.3.1
soupsieve==2.7
stack-data==0.6.3
terminado==0.18.1
threadpoolctl==3.6.0
tinycss2==1.4.0
tornado==6.4.2
tqdm==4.67.1
traitlets==5.14.3
types-python-dateutil==2.9.0.20241206
typing_extensions==4.13.2
tzdata==2025.2
uri-template==1.3.0
urllib3==2.4.0
wcwidth==0.2.13
webcolors==24.11.1
webencodings==0.5.1
websocket-client==1.8.0
xmltodict==0.14.2