T-Social

Informacje podstawowe

Nazwa: T-Social
Alias: T-Soc, Transaction Social Dataset
Dziedzina: Fraud Detection, Social Networks, Cybersecurity
Typ: Graph data (social network)

Źródło

URL: Dostępny przez GADBench
Paper: Rethinking graph neural networks for anomaly detection (Tang et al., 2022)
Organizacja: Michigan State University
Rok: 2022

Charakterystyka

Rozmiar: 5,781,065 nodes, 73,105,508 edges
Podział: Określany przez użytkowników (typowo 5-fold cross-validation)
Klasy/Kategorie: Binary (legitimate accounts vs abnormal accounts)
Format: Graph structure with node features
Licencja: Available through GADBench
Feature dimension: 10 features (user profile details such as logging activities)

Opis

T-Social to largest dataset w GADBench suite, przeznaczony do wykrywania abnormalnych kont w sieciach społecznościowych. Dataset reprezentuje social network jako graf, gdzie węzły to user accounts a krawędzie to social friendship connections.

Cechy węzłów zawierają user profile details takie jak logging activities, account creation time, behavioral patterns i engagement metrics. Dataset jest szczególnie challenging ze względu na massive scale (5.7M nodes, 73M edges).

Zastosowania

Social network fraud detection
Bot/fake account detection
Spam account identification
Large-scale graph anomaly detection
Scalability testing dla GNN methods
Benchmarking fraud detection algorithms

Używany w publikacjach

Global Attribute-Association Pattern Aggregation for Graph Fraud Detection - GAAP osiągnęło 97.25% Rec@K (best performance, +1.28pp improvement). Largest dataset w eksperymentach. Removing GNN module miało największy impact na tym datasecie (97.05% → 23.32%), ponieważ extraction of graph structure information jest crucial factor dla performance.

Benchmarki

Model	Metric	Score	Rok	Publikacja
GAAP	Rec@K	97.25%	2025	Duan et al. AAAI-25
BGNN	Rec@K	96.89%	2021	Ivanov et al.
DGA-GNN	Rec@K	95.97%	2024	Duan et al.
XGBGraph	Rec@K	93.53%	2024	Tang et al. GADBench
RFGraph	Rec@K	93.58%	2024	Tang et al. GADBench
GHRN	Rec@K	82.33%	2023	Gao et al.
PMP	Rec@K	81.11%	2024	Zhuo et al.

Uwagi

Largest dataset w GADBench suite (5.7M nodes, 73M edges)
Graph structure is critical: Removing GNN causes dramatic performance drop (97.05% → 23.32%)
Relation concept: Social Friendship (user profile details such as logging activities)
Testing ground dla scalability of fraud detection methods
Part of GADBench benchmark suite
Bardzo wysoka accuracy możliwa (97%+) dzięki strong graph structure signals

Tagi

dataset fraud-detection social-networks graph-data large-scale bot-detection fake-accounts gadbench benchmark

Research

Przeglądaj