Инициализация проекта
This commit is contained in:
commit
deac764f3f
2
.gitignore
vendored
Normal file
2
.gitignore
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
.venv
|
||||
A_Z_HandwrittenLetters.csv
|
151
.ipynb_checkpoints/plot_digits_classification-checkpoint.ipynb
Normal file
151
.ipynb_checkpoints/plot_digits_classification-checkpoint.ipynb
Normal file
@ -0,0 +1,151 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n# Recognizing hand-written digits\n\nThis example shows how scikit-learn can be used to recognize images of\nhand-written digits, from 0-9.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause\n\n# Standard scientific Python imports\nimport matplotlib.pyplot as plt\n\n# Import datasets, classifiers and performance metrics\nfrom sklearn import datasets, metrics, svm\nfrom sklearn.model_selection import train_test_split"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Digits dataset\n\nThe digits dataset consists of 8x8\npixel images of digits. The ``images`` attribute of the dataset stores\n8x8 arrays of grayscale values for each image. We will use these arrays to\nvisualize the first 4 images. The ``target`` attribute of the dataset stores\nthe digit each image represents and this is included in the title of the 4\nplots below.\n\nNote: if we were working from image files (e.g., 'png' files), we would load\nthem using :func:`matplotlib.pyplot.imread`.\n\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"digits = datasets.load_digits()\n\n_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))\nfor ax, image, label in zip(axes, digits.images, digits.target):\n ax.set_axis_off()\n ax.imshow(image, cmap=plt.cm.gray_r, interpolation=\"nearest\")\n ax.set_title(\"Training: %i\" % label)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Classification\n\nTo apply a classifier on this data, we need to flatten the images, turning\neach 2-D array of grayscale values from shape ``(8, 8)`` into shape\n``(64,)``. Subsequently, the entire dataset will be of shape\n``(n_samples, n_features)``, where ``n_samples`` is the number of images and\n``n_features`` is the total number of pixels in each image.\n\nWe can then split the data into train and test subsets and fit a support\nvector classifier on the train samples. The fitted classifier can\nsubsequently be used to predict the value of the digit for the samples\nin the test subset.\n\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# flatten the images\nn_samples = len(digits.images)\ndata = digits.images.reshape((n_samples, -1))\n\n# Create a classifier: a support vector classifier\nclf = svm.SVC(gamma=0.001)\n\n# Split data into 50% train and 50% test subsets\nX_train, X_test, y_train, y_test = train_test_split(\n data, digits.target, test_size=0.5, shuffle=False\n)\n\n# Learn the digits on the train subset\nclf.fit(X_train, y_train)\n\n# Predict the value of the digit on the test subset\npredicted = clf.predict(X_test)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below we visualize the first 4 test samples and show their predicted\ndigit value in the title.\n\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))\nfor ax, image, prediction in zip(axes, X_test, predicted):\n ax.set_axis_off()\n image = image.reshape(8, 8)\n ax.imshow(image, cmap=plt.cm.gray_r, interpolation=\"nearest\")\n ax.set_title(f\"Prediction: {prediction}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
":func:`~sklearn.metrics.classification_report` builds a text report showing\nthe main classification metrics.\n\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\n f\"Classification report for classifier {clf}:\\n\"\n f\"{metrics.classification_report(y_test, predicted)}\\n\"\n)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also plot a `confusion matrix <confusion_matrix>` of the\ntrue digit values and the predicted digit values.\n\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"disp = metrics.ConfusionMatrixDisplay.from_predictions(y_test, predicted)\ndisp.figure_.suptitle(\"Confusion Matrix\")\nprint(f\"Confusion matrix:\\n{disp.confusion_matrix}\")\n\nplt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If the results from evaluating a classifier are stored in the form of a\n`confusion matrix <confusion_matrix>` and not in terms of `y_true` and\n`y_pred`, one can still build a :func:`~sklearn.metrics.classification_report`\nas follows:\n\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# The ground truth and predicted lists\ny_true = []\ny_pred = []\ncm = disp.confusion_matrix\n\n# For each cell in the confusion matrix, add the corresponding ground truths\n# and predictions to the lists\nfor gt in range(len(cm)):\n for pred in range(len(cm)):\n y_true += [gt] * cm[gt][pred]\n y_pred += [pred] * cm[gt][pred]\n\nprint(\n \"Classification report rebuilt from confusion matrix:\\n\"\n f\"{metrics.classification_report(y_true, y_pred)}\\n\"\n)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.21"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
84
app.py
Normal file
84
app.py
Normal file
@ -0,0 +1,84 @@
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.svm import SVC
|
||||
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
|
||||
|
||||
# -----------------------------
|
||||
# 1. Загрузка данных
|
||||
# -----------------------------
|
||||
# Попробуем определить, есть ли заголовки
|
||||
try:
|
||||
df = pd.read_csv('A_Z_HandwrittenLetters.csv')
|
||||
if 'label' in df.columns:
|
||||
print("Заголовки найдены, используется столбец 'label'")
|
||||
else:
|
||||
raise Exception("Столбец 'label' не найден")
|
||||
except Exception as e:
|
||||
print(f"Ошибка: {e}, попытка чтения без заголовков")
|
||||
df = pd.read_csv('A_Z_HandwrittenLetters.csv', header=None)
|
||||
|
||||
# Разделение на признаки и метки
|
||||
if 'label' in df.columns:
|
||||
y = df['label'].values
|
||||
X = df.drop('label', axis=1).values
|
||||
else:
|
||||
y = df[0].values
|
||||
X = df.drop(0, axis=1).values
|
||||
|
||||
# -----------------------------
|
||||
# 2. Нормализация
|
||||
# -----------------------------
|
||||
X = X / 255.0
|
||||
|
||||
# -----------------------------
|
||||
# 3. Визуализация первых изображений
|
||||
# -----------------------------
|
||||
_, axes = plt.subplots(1, 4, figsize=(10, 5))
|
||||
for ax, image, label in zip(axes, X[:4], y[:4]):
|
||||
ax.imshow(image.reshape(28, 28), cmap='gray')
|
||||
ax.axis('off')
|
||||
ax.set_title(f"Label: {label}\n({chr(label + ord('A'))})")
|
||||
plt.suptitle("Sample Training Images")
|
||||
plt.show()
|
||||
|
||||
# -----------------------------
|
||||
# 4. Разделение выборки
|
||||
# -----------------------------
|
||||
X_train, X_test, y_train, y_test = train_test_split(
|
||||
X, y, test_size=0.3, random_state=42, shuffle=True
|
||||
)
|
||||
|
||||
# -----------------------------
|
||||
# 5. Обучение модели
|
||||
# -----------------------------
|
||||
print("Обучение модели...")
|
||||
clf = SVC(gamma=0.001)
|
||||
clf.fit(X_train, y_train)
|
||||
|
||||
# -----------------------------
|
||||
# 6. Предсказание
|
||||
# -----------------------------
|
||||
y_pred = clf.predict(X_test)
|
||||
|
||||
# -----------------------------
|
||||
# 7. Визуализация предсказаний
|
||||
# -----------------------------
|
||||
_, axes = plt.subplots(1, 4, figsize=(10, 5))
|
||||
for ax, image, prediction in zip(axes, X_test, y_pred):
|
||||
ax.imshow(image.reshape(28, 28), cmap='gray')
|
||||
ax.axis('off')
|
||||
ax.set_title(f"Prediction: {prediction}\n({chr(prediction + ord('A'))})")
|
||||
plt.suptitle("Predicted Letters")
|
||||
plt.show()
|
||||
|
||||
# -----------------------------
|
||||
# 8. Отчеты и матрица ошибок
|
||||
# -----------------------------
|
||||
print("\nClassification Report:")
|
||||
print(classification_report(y_test, y_pred))
|
||||
|
||||
disp = ConfusionMatrixDisplay.from_predictions(y_test, y_pred)
|
||||
disp.figure_.suptitle("Confusion Matrix")
|
||||
plt.show()
|
368
plot_digits_classification.ipynb
Normal file
368
plot_digits_classification.ipynb
Normal file
File diff suppressed because one or more lines are too long
128
plot_digits_classification.py
Normal file
128
plot_digits_classification.py
Normal file
@ -0,0 +1,128 @@
|
||||
"""
|
||||
================================
|
||||
Recognizing hand-written digits
|
||||
================================
|
||||
|
||||
This example shows how scikit-learn can be used to recognize images of
|
||||
hand-written digits, from 0-9.
|
||||
|
||||
"""
|
||||
|
||||
# Authors: The scikit-learn developers
|
||||
# SPDX-License-Identifier: BSD-3-Clause
|
||||
|
||||
# Standard scientific Python imports
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Import datasets, classifiers and performance metrics
|
||||
from sklearn import datasets, metrics, svm
|
||||
from sklearn.model_selection import train_test_split
|
||||
|
||||
###############################################################################
|
||||
# Digits dataset
|
||||
# --------------
|
||||
#
|
||||
# The digits dataset consists of 8x8
|
||||
# pixel images of digits. The ``images`` attribute of the dataset stores
|
||||
# 8x8 arrays of grayscale values for each image. We will use these arrays to
|
||||
# visualize the first 4 images. The ``target`` attribute of the dataset stores
|
||||
# the digit each image represents and this is included in the title of the 4
|
||||
# plots below.
|
||||
#
|
||||
# Note: if we were working from image files (e.g., 'png' files), we would load
|
||||
# them using :func:`matplotlib.pyplot.imread`.
|
||||
|
||||
digits = datasets.load_digits()
|
||||
|
||||
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
|
||||
for ax, image, label in zip(axes, digits.images, digits.target):
|
||||
ax.set_axis_off()
|
||||
ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
|
||||
ax.set_title("Training: %i" % label)
|
||||
|
||||
###############################################################################
|
||||
# Classification
|
||||
# --------------
|
||||
#
|
||||
# To apply a classifier on this data, we need to flatten the images, turning
|
||||
# each 2-D array of grayscale values from shape ``(8, 8)`` into shape
|
||||
# ``(64,)``. Subsequently, the entire dataset will be of shape
|
||||
# ``(n_samples, n_features)``, where ``n_samples`` is the number of images and
|
||||
# ``n_features`` is the total number of pixels in each image.
|
||||
#
|
||||
# We can then split the data into train and test subsets and fit a support
|
||||
# vector classifier on the train samples. The fitted classifier can
|
||||
# subsequently be used to predict the value of the digit for the samples
|
||||
# in the test subset.
|
||||
|
||||
# flatten the images
|
||||
n_samples = len(digits.images)
|
||||
data = digits.images.reshape((n_samples, -1))
|
||||
|
||||
# Create a classifier: a support vector classifier
|
||||
clf = svm.SVC(gamma=0.001)
|
||||
|
||||
# Split data into 50% train and 50% test subsets
|
||||
X_train, X_test, y_train, y_test = train_test_split(
|
||||
data, digits.target, test_size=0.5, shuffle=False
|
||||
)
|
||||
|
||||
# Learn the digits on the train subset
|
||||
clf.fit(X_train, y_train)
|
||||
|
||||
# Predict the value of the digit on the test subset
|
||||
predicted = clf.predict(X_test)
|
||||
|
||||
###############################################################################
|
||||
# Below we visualize the first 4 test samples and show their predicted
|
||||
# digit value in the title.
|
||||
|
||||
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
|
||||
for ax, image, prediction in zip(axes, X_test, predicted):
|
||||
ax.set_axis_off()
|
||||
image = image.reshape(8, 8)
|
||||
ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
|
||||
ax.set_title(f"Prediction: {prediction}")
|
||||
|
||||
###############################################################################
|
||||
# :func:`~sklearn.metrics.classification_report` builds a text report showing
|
||||
# the main classification metrics.
|
||||
|
||||
print(
|
||||
f"Classification report for classifier {clf}:\n"
|
||||
f"{metrics.classification_report(y_test, predicted)}\n"
|
||||
)
|
||||
|
||||
###############################################################################
|
||||
# We can also plot a :ref:`confusion matrix <confusion_matrix>` of the
|
||||
# true digit values and the predicted digit values.
|
||||
|
||||
disp = metrics.ConfusionMatrixDisplay.from_predictions(y_test, predicted)
|
||||
disp.figure_.suptitle("Confusion Matrix")
|
||||
print(f"Confusion matrix:\n{disp.confusion_matrix}")
|
||||
|
||||
plt.show()
|
||||
|
||||
###############################################################################
|
||||
# If the results from evaluating a classifier are stored in the form of a
|
||||
# :ref:`confusion matrix <confusion_matrix>` and not in terms of `y_true` and
|
||||
# `y_pred`, one can still build a :func:`~sklearn.metrics.classification_report`
|
||||
# as follows:
|
||||
|
||||
|
||||
# The ground truth and predicted lists
|
||||
y_true = []
|
||||
y_pred = []
|
||||
cm = disp.confusion_matrix
|
||||
|
||||
# For each cell in the confusion matrix, add the corresponding ground truths
|
||||
# and predictions to the lists
|
||||
for gt in range(len(cm)):
|
||||
for pred in range(len(cm)):
|
||||
y_true += [gt] * cm[gt][pred]
|
||||
y_pred += [pred] * cm[gt][pred]
|
||||
|
||||
print(
|
||||
"Classification report rebuilt from confusion matrix:\n"
|
||||
f"{metrics.classification_report(y_true, y_pred)}\n"
|
||||
)
|
114
requirements.txt
Normal file
114
requirements.txt
Normal file
@ -0,0 +1,114 @@
|
||||
anyio==4.9.0
|
||||
argon2-cffi==23.1.0
|
||||
argon2-cffi-bindings==21.2.0
|
||||
arrow==1.3.0
|
||||
asttokens==3.0.0
|
||||
async-lru==2.0.5
|
||||
attrs==25.3.0
|
||||
babel==2.17.0
|
||||
beautifulsoup4==4.13.4
|
||||
bleach==6.2.0
|
||||
certifi==2025.4.26
|
||||
cffi==1.17.1
|
||||
charset-normalizer==3.4.2
|
||||
colorama==0.4.6
|
||||
comm==0.2.2
|
||||
contourpy==1.3.2
|
||||
cycler==0.12.1
|
||||
debugpy==1.8.14
|
||||
decorator==5.2.1
|
||||
defusedxml==0.7.1
|
||||
executing==2.2.0
|
||||
fastjsonschema==2.21.1
|
||||
fonttools==4.58.0
|
||||
fqdn==1.5.1
|
||||
h11==0.16.0
|
||||
httpcore==1.0.9
|
||||
httpx==0.28.1
|
||||
idna==3.10
|
||||
ipykernel==6.29.5
|
||||
ipython==9.2.0
|
||||
ipython_pygments_lexers==1.1.1
|
||||
isoduration==20.11.0
|
||||
jedi==0.19.2
|
||||
Jinja2==3.1.6
|
||||
joblib==1.5.0
|
||||
json5==0.12.0
|
||||
jsonpointer==3.0.0
|
||||
jsonschema==4.23.0
|
||||
jsonschema-specifications==2025.4.1
|
||||
jupyter-events==0.12.0
|
||||
jupyter-lsp==2.2.5
|
||||
jupyter_client==8.6.3
|
||||
jupyter_core==5.7.2
|
||||
jupyter_server==2.16.0
|
||||
jupyter_server_terminals==0.5.3
|
||||
jupyterlab==4.4.2
|
||||
jupyterlab_pygments==0.3.0
|
||||
jupyterlab_server==2.27.3
|
||||
kiwisolver==1.4.8
|
||||
liac-arff==2.5.0
|
||||
MarkupSafe==3.0.2
|
||||
matplotlib==3.10.3
|
||||
matplotlib-inline==0.1.7
|
||||
minio==7.2.15
|
||||
mistune==3.1.3
|
||||
nbclient==0.10.2
|
||||
nbconvert==7.16.6
|
||||
nbformat==5.10.4
|
||||
nest-asyncio==1.6.0
|
||||
notebook_shim==0.2.4
|
||||
numpy==2.2.5
|
||||
openml==0.15.1
|
||||
overrides==7.7.0
|
||||
packaging==25.0
|
||||
pandas==2.2.3
|
||||
pandocfilters==1.5.1
|
||||
parso==0.8.4
|
||||
pillow==11.2.1
|
||||
platformdirs==4.3.8
|
||||
prometheus_client==0.21.1
|
||||
prompt_toolkit==3.0.51
|
||||
psutil==7.0.0
|
||||
pure_eval==0.2.3
|
||||
pyarrow==20.0.0
|
||||
pycparser==2.22
|
||||
pycryptodome==3.22.0
|
||||
Pygments==2.19.1
|
||||
pyparsing==3.2.3
|
||||
python-dateutil==2.9.0.post0
|
||||
python-json-logger==3.3.0
|
||||
pytz==2025.2
|
||||
pywin32==310
|
||||
pywinpty==2.0.15
|
||||
PyYAML==6.0.2
|
||||
pyzmq==26.4.0
|
||||
referencing==0.36.2
|
||||
requests==2.32.3
|
||||
rfc3339-validator==0.1.4
|
||||
rfc3986-validator==0.1.1
|
||||
rpds-py==0.24.0
|
||||
scikit-learn==1.6.1
|
||||
scipy==1.15.3
|
||||
Send2Trash==1.8.3
|
||||
setuptools==80.4.0
|
||||
six==1.17.0
|
||||
sniffio==1.3.1
|
||||
soupsieve==2.7
|
||||
stack-data==0.6.3
|
||||
terminado==0.18.1
|
||||
threadpoolctl==3.6.0
|
||||
tinycss2==1.4.0
|
||||
tornado==6.4.2
|
||||
tqdm==4.67.1
|
||||
traitlets==5.14.3
|
||||
types-python-dateutil==2.9.0.20241206
|
||||
typing_extensions==4.13.2
|
||||
tzdata==2025.2
|
||||
uri-template==1.3.0
|
||||
urllib3==2.4.0
|
||||
wcwidth==0.2.13
|
||||
webcolors==24.11.1
|
||||
webencodings==0.5.1
|
||||
websocket-client==1.8.0
|
||||
xmltodict==0.14.2
|
Loading…
Reference in New Issue
Block a user