INDEX
Explanations
Hate and dislike
This neuron activates on words and phrases expressing strong negative emotions or interpersonal conflict (e.g. hate, annoy, nuts).
New Auto-Interp
Negative Logits
ет
-0.07
Traits
-0.06
ADDRESS
-0.06
Both
-0.06
trata
-0.06
tracking
-0.06
contained
-0.06
disaster
-0.06
Request
-0.06
Tears
-0.06
POSITIVE LOGITS
tö
0.07
�
0.07
příro
0.06
showc
0.06
seviy
0.06
visor
0.06
=_("0.06
نت
0.06
haze
0.06
nive
0.06
Activations Density 0.048%