INDEX
Explanations
pronouns
This neuron detects explicit sexual content and erotic requests—especially language about nudity, sexual acts, or coercive/abusive sexual behavior.
third-person human-referent pronouns, especially object and plural forms indicating people.
New Auto-Interp
Negative Logits
visited
-0.07
_UNDEFINED
-0.06
ones
-0.06
national
-0.06
Def
-0.06
Last
-0.06
Warn
-0.06
Niagara
-0.06
cone
-0.06
seq
-0.06
POSITIVE LOGITS
'i
0.07
tohoto
0.07
||↵
0.07
root
0.06
stato
0.06
(priority
0.06
’dan
0.06
ги
0.06
개의
0.06
sayf
0.06
Activations Density 0.237%