{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Biclustering — HIGGS Dataset\n", "\n", "Датасет: **HIGGS** (ID=23512) — физика частиц, Большой адронный коллайдер. \n", "https://www.openml.org/search?type=data&sort=runs&status=active&id=23512\n", "\n", "**Цель задачи:** \n", "Применить `SpectralBiclustering` для поиска групп физических событий и признаков, ведущих себя схоже.\n", "\n", "**О датасете:** \n", "Каждая строка — одно столкновение частиц. Задача: отличить сигнал (рождение бозона Хиггса) от фона (обычные процессы). 28 числовых признаков: первые 21 — кинематические измерения, последние 7 — производные признаки, вычисленные физиками." ], "id": "48ff89307f6d208a" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.1. Импорт библиотек" ], "id": "c425859772d5010f" }, { "cell_type": "code", "metadata": { "ExecuteTime": { "end_time": "2026-05-07T18:17:43.295524600Z", "start_time": "2026-05-07T18:17:42.752717900Z" } }, "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.datasets import fetch_openml\n", "from sklearn.cluster import SpectralBiclustering\n", "from sklearn.preprocessing import StandardScaler" ], "id": "fa924c2a992315b0", "outputs": [], "execution_count": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2. Получение данных из OpenML" ], "id": "c42a3bcc8663ca" }, { "cell_type": "code", "metadata": { "ExecuteTime": { "end_time": "2026-05-07T18:17:53.146300500Z", "start_time": "2026-05-07T18:17:43.295524600Z" } }, "source": [ "print(\"Загрузка HIGGS из OpenML\")\n", "higgs = fetch_openml(data_id=23512, as_frame=True, parser=\"auto\")\n", "\n", "print(f\"Загружен! Размер: {higgs.frame.shape}\")\n", "print(f\"Признаки: {higgs.feature_names}\")\n", "higgs.frame.head(3)" ], "id": "c77c11ce624d28d0", "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Загрузка HIGGS из OpenML\n", "Загружен! Размер: (98050, 29)\n", "Признаки: ['lepton_pT', 'lepton_eta', 'lepton_phi', 'missing_energy_magnitude', 'missing_energy_phi', 'jet1pt', 'jet1eta', 'jet1phi', 'jet1b-tag', 'jet2pt', 'jet2eta', 'jet2phi', 'jet2b-tag', 'jet3pt', 'jet3eta', 'jet3phi', 'jet3b-tag', 'jet4pt', 'jet4eta', 'jet4phi', 'jet4b-tag', 'm_jj', 'm_jjj', 'm_lv', 'm_jlv', 'm_bb', 'm_wbb', 'm_wwbb']\n" ] }, { "data": { "text/plain": [ " class lepton_pT lepton_eta lepton_phi missing_energy_magnitude \\\n", "0 1 0.907542 0.329147 0.359412 1.497970 \n", "1 1 0.798835 1.470639 -1.635975 0.453773 \n", "2 0 1.344385 -0.876626 0.935913 1.992050 \n", "\n", " missing_energy_phi jet1pt jet1eta jet1phi jet1b-tag ... jet4eta \\\n", "0 -0.313010 1.095531 -0.557525 -1.588230 2.173076 ... -1.138930 \n", "1 0.425629 1.104875 1.282322 1.381664 0.000000 ... 1.128848 \n", "2 0.882454 1.786066 -1.646778 -0.942383 0.000000 ... -0.678379 \n", "\n", " jet4phi jet4b-tag m_jj m_jjj m_lv m_jlv m_bb \\\n", "0 -0.000819 0.0 0.302220 0.833048 0.985700 0.978098 0.779732 \n", "1 0.900461 0.0 0.909753 1.108330 0.985692 0.951331 0.803252 \n", "2 -1.360356 0.0 0.946652 1.028704 0.998656 0.728281 0.869200 \n", "\n", " m_wbb m_wwbb \n", "0 0.992356 0.798343 \n", "1 0.865924 0.780118 \n", "2 1.026736 0.957904 \n", "\n", "[3 rows x 29 columns]" ], "text/html": [ "
| \n", " | class | \n", "lepton_pT | \n", "lepton_eta | \n", "lepton_phi | \n", "missing_energy_magnitude | \n", "missing_energy_phi | \n", "jet1pt | \n", "jet1eta | \n", "jet1phi | \n", "jet1b-tag | \n", "... | \n", "jet4eta | \n", "jet4phi | \n", "jet4b-tag | \n", "m_jj | \n", "m_jjj | \n", "m_lv | \n", "m_jlv | \n", "m_bb | \n", "m_wbb | \n", "m_wwbb | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "0.907542 | \n", "0.329147 | \n", "0.359412 | \n", "1.497970 | \n", "-0.313010 | \n", "1.095531 | \n", "-0.557525 | \n", "-1.588230 | \n", "2.173076 | \n", "... | \n", "-1.138930 | \n", "-0.000819 | \n", "0.0 | \n", "0.302220 | \n", "0.833048 | \n", "0.985700 | \n", "0.978098 | \n", "0.779732 | \n", "0.992356 | \n", "0.798343 | \n", "
| 1 | \n", "1 | \n", "0.798835 | \n", "1.470639 | \n", "-1.635975 | \n", "0.453773 | \n", "0.425629 | \n", "1.104875 | \n", "1.282322 | \n", "1.381664 | \n", "0.000000 | \n", "... | \n", "1.128848 | \n", "0.900461 | \n", "0.0 | \n", "0.909753 | \n", "1.108330 | \n", "0.985692 | \n", "0.951331 | \n", "0.803252 | \n", "0.865924 | \n", "0.780118 | \n", "
| 2 | \n", "0 | \n", "1.344385 | \n", "-0.876626 | \n", "0.935913 | \n", "1.992050 | \n", "0.882454 | \n", "1.786066 | \n", "-1.646778 | \n", "-0.942383 | \n", "0.000000 | \n", "... | \n", "-0.678379 | \n", "-1.360356 | \n", "0.0 | \n", "0.946652 | \n", "1.028704 | \n", "0.998656 | \n", "0.728281 | \n", "0.869200 | \n", "1.026736 | \n", "0.957904 | \n", "
3 rows × 29 columns
\n", "