A Survey on Truth Discovery
Metadane
- Autorzy: Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, Jiawei Han
- Rok: 2016
- Źródło: ACM SIGKDD Explorations Newsletter, Vol. 17, No. 2, Pages 1-16
- DOI/Link: 10.1145/2897350.2897352 (arXiv:1505.02463)
- Status: to-read
- Pochodzenie: Wyekstrahowane z phishchain-2022 ([5] - truth discovery algorithms from database community)
- Tagi: to-read reference truth-discovery crowd-sourcing data-quality em-algorithm glad survey
Notatki
Publikacja dodana automatycznie z bibliografii.
Kontekst cytowania w PhishChain 2022:
- Referenced jako [5] w Truth Discovery Module section
- Key observation: Existing truth discovery algorithms (from database community) perform poorly on PhishChain problem
- Reason: Traditional algorithms (EM, GLAD) assume majority of verifiers respond to each task ← NOT true dla URL verification
PhishChain findings:
- PhishTank retrospective: only handful verify każdy URL mimo thousands total verifiers
- Sparse verification scenario violates assumptions of EM, GLAD algorithms
- Motivated PhishChain’s PageRank-based truth discovery approach
Baseline algorithms benchmarked:
- EM (Expectation Maximization): 93.71% accuracy on PhishTank 2020
- GLAD: 93.98% accuracy on PhishTank 2020
- PhishChain PR-based: 95.45% accuracy (outperforms both)
Survey coverage:
- Truth discovery algorithms dla conflicting crowd-sourced data
- Inferring ground truth from assessments with varying expertise
- Database community approaches to data quality
Dodaj PDF aby wygenerować pełne podsumowanie używając /summarize-paper li-truth-discovery-survey-2016