GuardReasoner: Towards Reasoning-Based LLM Safeguards

Metadane

Autorzy: Yue Liu, Hongcheng Gao, Shengfang Zhai, Jun Xia, Tianyi Wu, Zhiwei Xue, Yulin Chen, Kenji Kawaguchi, Jiaheng Zhang, Bryan Hooi
Rok: 2025
Źródło: arXiv e-prints, arXiv-2501
DOI/Link: https://github.com/yueliu1999/GuardReasoner/?tab=readme-ov-file
Status: to-read
Pochodzenie: Wyekstrahowane z phishsense-1b-2025
Tagi: to-read reference llm safeguards reasoning

Notatki

Publikacja dodana automatycznie z bibliografii. Framework do fine-tuning LLMs dla security-focused reasoning tasks. Używany jako metodologia bazowa w Phishsense-1B.

Kontekst użycia w Phishsense-1B:

Base model training stage wykorzystuje GuardReasoner methodology
Adaptuje llama-3.2-1B dla improved reasoning capabilities
Kluczowy element two-tiered approach (reasoning + phishing-specific)

Dodaj PDF aby wygenerować pełne podsumowanie używając /summarize-paper liu-guardreasoner-2025

Research

Przeglądaj

GuardReasoner: Towards Reasoning-Based LLM Safeguards

GuardReasoner: Towards Reasoning-Based LLM Safeguards

Metadane

Notatki