INDEX
Explanations
The neuron is keyed to occurrences of the “he/she” pronoun form (the slash‐separated gendered pronoun).
New Auto-Interp
Negative Logits
Roc
-0.06
sunt
-0.06
EmptyEntries
-0.06
acid
-0.06
ENCHMARK
-0.06
Cold
-0.06
이야
-0.06
sweetness
-0.06
<N
-0.06
explanatory
-0.06
POSITIVE LOGITS
lesia
0.08
IRM
0.07
grupo
0.06
markers
0.06
LES
0.06
gồm
0.06
erotik
0.06
Admir
0.06
moz
0.06
_Def
0.06
Activations Density 0.002%