INDEX
Explanations
critical online discussions
The neuron fires on direct-address insults and profanity (e.g. “you,” “screw yourself,” “f*ck you”)—i.e. hostile or abusive second-person language.
New Auto-Interp
Negative Logits
laughs
-0.07
Mon
-0.07
/includes
-0.07
โรค
-0.06
ListTile
-0.06
subclasses
-0.06
REC
-0.06
recognition
-0.06
Songs
-0.06
ادات
-0.06
POSITIVE LOGITS
ResourceId
0.06
_{0.06
距
0.06
ULLET
0.06
cev
0.06
ولي
0.06
rotten
0.06
_:
0.06
--- ↵
0.06
ện
0.06
Activations Density 0.064%