The Enron Corpus: A New Dataset for Email Classification Research

Metadane

Notatki

Publikacja dodana automatycznie z bibliografii.

Kontekst cytowania w Al-Subaiey 2024:

  • Referenced jako [35] w paper (References section)
  • Original paper describing Enron Email Corpus
  • Dataset used as one of 6 sources in Al-Subaiey’s merged dataset (82,486 emails total)
  • Enron Corpus: foundational public email dataset enabling reproducible research

Historical Significance:

  • First large-scale public email corpus (2004)
  • Enabled transition from proprietary datasets to open research
  • Cited thousands of times (>10,000 citations across research)
  • Still widely used 20 years later (including Al-Subaiey 2024)

Dataset Characteristics (from original paper):

  • ~500,000 emails from Enron Corporation employees
  • Released by FERC (Federal Energy Regulatory Commission) after Enron bankruptcy
  • Real-world authentic business emails (not synthetic)
  • Multiple research applications: spam detection, topic classification, social network analysis

Impact on Al-Subaiey 2024:

  • Provided realistic business email data
  • Part of comprehensive 82k email merged dataset
  • Contributes to generalizability (business context representation)

Limitation:

  • Emails from 1999-2002 (dated; phishing tactics evolved)
  • Al-Subaiey addresses by combining with recent datasets (CEAS, Nazario, etc.)

Dodaj PDF aby wygenerować pełne podsumowanie używając /summarize-paper klimt-enron-corpus-2004

Elementów w folderze: 0.