The Enron Corpus: A New Dataset for Email Classification Research
Metadane
- Autorzy: B. Klimt, Y. Yang
- Rok: 2004
- Źródło: Machine Learning: ECML 2004, Springer, Pages 217–226
- DOI/Link: 10.1007/978-3-540-30115-8_22
- Status: to-read
- Pochodzenie: Wyekstrahowane z al-subaiey-web-ai-phishing-2024 ([35] - original Enron Corpus paper, dataset source)
- Tagi: to-read reference enron-corpus dataset-paper email-classification benchmark ecml
Notatki
Publikacja dodana automatycznie z bibliografii.
Kontekst cytowania w Al-Subaiey 2024:
- Referenced jako [35] w paper (References section)
- Original paper describing Enron Email Corpus
- Dataset used as one of 6 sources in Al-Subaiey’s merged dataset (82,486 emails total)
- Enron Corpus: foundational public email dataset enabling reproducible research
Historical Significance:
- First large-scale public email corpus (2004)
- Enabled transition from proprietary datasets to open research
- Cited thousands of times (>10,000 citations across research)
- Still widely used 20 years later (including Al-Subaiey 2024)
Dataset Characteristics (from original paper):
- ~500,000 emails from Enron Corporation employees
- Released by FERC (Federal Energy Regulatory Commission) after Enron bankruptcy
- Real-world authentic business emails (not synthetic)
- Multiple research applications: spam detection, topic classification, social network analysis
Impact on Al-Subaiey 2024:
- Provided realistic business email data
- Part of comprehensive 82k email merged dataset
- Contributes to generalizability (business context representation)
Limitation:
- Emails from 1999-2002 (dated; phishing tactics evolved)
- Al-Subaiey addresses by combining with recent datasets (CEAS, Nazario, etc.)
Dodaj PDF aby wygenerować pełne podsumowanie używając /summarize-paper klimt-enron-corpus-2004