INDEX
Explanations
Humanity and/or humans
The neuron fires on derogatory language aimed at “humans,” particularly adjectives that insult or demean people (e.g. “pathetic,” “stupid,” etc.).
New Auto-Interp
Negative Logits
_rf
-0.08
]){↵-0.07
zm
-0.07
секрет
-0.06
Zh
-0.06
-full
-0.06
кид
-0.06
lorsque
-0.06
concaten
-0.06
Cols
-0.06
POSITIVE LOGITS
.lua
0.07
concentrates
0.07
Cooperative
0.06
Ottoman
0.06
_DEFINITION
0.06
ment
0.06
(Object
0.06
bitter
0.06
Tests
0.06
kavram
0.06
Activations Density 0.021%