INDEX
Explanations
negative/critical sentiment
The neuron fires on bluntly negative or insulting terms—especially profanity and pejoratives (e.g. “bullshit,” “brainwashed,” “dogma,” “subhuman”).
New Auto-Interp
Negative Logits
molecular
-0.07
Webb
-0.07
béné
-0.06
truth
-0.06
orce
-0.06
Ill
-0.06
moons
-0.06
Jain
-0.06
rally
-0.06
Validation
-0.06
POSITIVE LOGITS
continuation
0.07
peuvent
0.06
metis
0.06
nomin
0.06
extremist
0.06
ул
0.06
\htdocs
0.06
vt
0.06
bits
0.06
bios
0.06
Activations Density 0.132%