INDEX
Explanations
The neuron detects the occurrence of the word “error” (notably as part of the assistant’s “If you believe this is an error…” feedback request).
New Auto-Interp
Negative Logits
影響
-0.07
_none
-0.06
abbrev
-0.06
persists
-0.06
CENTER
-0.06
schizophren
-0.06
mutex
-0.06
post
-0.06
("/");↵-0.06
marking
-0.06
POSITIVE LOGITS
stretched
0.07
_mini
0.07
(KP
0.07
그가
0.07
dolor
0.06
$core
0.06
(today
0.06
/calendar
0.06
DOWNLOAD
0.06
총
0.06
Activations Density 0.002%