NIST Phish Scale

Informacje podstawowe

  • Nazwa: NIST Phish Scale
  • Alias: Phish Scale, NIST TN 2276
  • Dziedzina: Cybersecurity, Phishing Detection
  • Typ: Measurement framework, difficulty assessment tool

Źródło

Charakterystyka

  • Rozmiar: Framework (not a dataset per se)
  • Podział: Categorizes phishing lures into difficulty levels
  • Klasy/Kategorie: 3x3 matrix (Cues: Few/Some/Many × Alignment: Weak/Medium/Strong)
  • Format: Assessment methodology
  • Licencja: Public domain (US Government work)

Opis

NIST Phish Scale to standardized framework do oceny difficulty phishing lures. Umożliwia organizacjom benchmarking lure complexity i contextualization simulation outcomes.

Two-Dimensional Assessment:

  1. Phishing Cues - Observable errors lub inconsistencies w email alerting users:

    • Spelling mistakes
    • Suspicious URLs
    • Formatting irregularities
    • Sender authenticity markers
    • Scale: Few (hardest to detect) → Many (easiest to detect)
  2. Premise Alignment - Relevance email do organizational context:

    • Subject matter alignment z recipient’s job
    • Sender plausibility
    • Content alignment z typical communications
    • Scale: Weak (low relevance) → Strong (high relevance)

Difficulty Classification:

  • Easy: High cues + Low alignment (Many cues, Weak premise)
  • Medium: Moderate on both dimensions (Some cues, Medium premise)
  • Hard: Low cues + High alignment (Few cues, Strong premise)

Validation:

  • Early validation: Barrientos et al. (2021) w lab settings (n=117)
  • Large-scale validation: Rozema & Davis (2025) w enterprise (n=12,511)

Zastosowania

  • Benchmarking phishing simulation difficulty
  • Standardized reporting of training effectiveness
  • Comparing results across organizations
  • Calibrating phishing campaigns to appropriate difficulty levels
  • Avoiding “gaming metrics” przez using only easy lures
  • Research: controlling for lure complexity w experiments
  • Vendor accountability: transparent difficulty assessment

Używany w publikacjach

  • anti-phishing-training-2025 - First large-scale enterprise validation (N=12,511); F(2,12086)=41.415, p<0.001; click rates: 7.0% (easy) → 15.0% (hard)

Benchmarki

Enterprise Validation (Rozema & Davis 2025):

DifficultyClick RateNContext
Easy7.0%5,721High cues, low alignment
Medium8.7%2,279Some cues, medium alignment
Hard15.0%4,511Few cues, high alignment

Statistical Effect:

  • F(2, 12086) = 41.415, p < 0.001
  • η² = 0.007 (small but meaningful)
  • Practical significance: Click rates doubled from easy to hard

Lab Validation (Canham et al. 2024, n=117):

  • Confirmed Phish Scale predicts differential susceptibility
  • Validated framework w controlled academic setting

Uwagi

Strengths:

  • First standardized, open framework for phishing difficulty
  • Two-dimensional assessment (cues × alignment) captures nuance
  • Enables cross-organizational comparison
  • Prevents “teaching to the test” (vendors gaming metrics)
  • Public domain - free to use
  • Validated at both lab and enterprise scale

Limitations:

  • Subjective assessment (requires expert raters)
  • Inter-rater reliability requires multiple assessors
  • Time-consuming to apply (expert review needed)
  • May not capture AI-generated phishing (lack traditional flaws)
  • Focused on email phishing (unclear applicability to SMS, voice)

Practical Considerations:

  • Requires 2-3 trained raters for reliability
  • Disagreements resolved through discussion
  • Best used with organizational context knowledge
  • Should update as phishing tactics evolve
  • LLM-generated phishing may challenge framework (perfect grammar, valid certs)

Future Directions (from Anti-Phishing Training 2025):

  • Adaptation for AI-generated phishing attacks
  • Extension to non-email modalities (SMS, voice, deepfakes)
  • Automated assessment tools (reduce manual rating burden)
  • Integration with phishing simulation platforms

Tagi

framework phishing-detection nist standardization difficulty-measurement cybersecurity human-factors validation