Mendeley Phishing Websites Dataset

Nazwa: Mendeley Phishing Websites Dataset
Alias: Phishing Websites Dataset, Mendeley Phishing
Dziedzina: Cybersecurity, Phishing Detection
Typ: Web data (URLs, HTML, metadata)

Informacje podstawowe

Rozmiar: Varies (publikacja użyła sampel 1000 stron: 500 phishing + 500 legitimate)
Podział: Balanced binary classification
Klasy/Kategorie: 2 klasy (Phishing, Legitimate)
Format: URL, HTML source, metadata
Licencja: Open access (Mendeley Data)

Mendeley Phishing Websites Dataset to zbiór danych zawierający strony phishingowe i legit ymne zebrane z różnych źródeł. Dataset zawiera:

Dane pochodzą z:

Dataset jest używany do trenowania i ewaluacji modeli machine learning/deep learning do wykrywania phishingu.

phishdebate-2025 - Performance benchmarking multi-agent LLM framework, sampel 500 phishing + 500 legitimate

Model/System	Metric	Score	Rok	Publikacja
PhishDebate (GPT-4o)	Accuracy	96.50%	2025	PhishDebate
PhishDebate (GPT-4o)	Precision	94.97%	2025	PhishDebate
PhishDebate (GPT-4o)	Recall	98.2%	2025	PhishDebate
PhishDebate (GPT-4o)	F1 Score	96.56%	2025	PhishDebate

Dataset zawiera mixed sources, co zapewnia różnorodność phishing tactics
Idealny do balanced binary classification tasks
HTML source może wymagać preprocessingu (cleaning, feature extraction)
Metadata może zawierać third-party features (WHOIS, Alexa rank) - sprawdzić dostępność
Mendeley Data zapewnia persistent DOI dla reproducibility