INDEX
Explanations
Denials or accusations
This neuron detects reporting-style verbs and attribution cues (e.g. said, denied, according to, reported).
New Auto-Interp
Negative Logits
Bài
-0.06
guna
-0.06
("./-0.06
ATA
-0.06
Это
-0.06
труд
-0.06
่าว
-0.06
Systems
-0.06
decrease
-0.06
handler
-0.06
POSITIVE LOGITS
іч
0.07
Instant
0.07
duino
0.07
nič
0.07
/documentation
0.06
rua
0.06
мор
0.06
TOP
0.06
_foot
0.06
TP
0.06
Activations Density 0.024%