INDEX
Explanations
awards and honors
This neuron detects mentions of honors, awards, or recognitions (e.g., receiving or being nominated for an award).
New Auto-Interp
Negative Logits
vert
-0.06
lil
-0.06
(match
-0.06
tote
-0.06
peninsula
-0.06
rotary
-0.06
testcase
-0.06
�
-0.06
with
-0.05
ESPN
-0.05
POSITIVE LOGITS
Gameplay
0.08
于
0.07
asca
0.07
Nap
0.07
cé
0.06
σσότε
0.06
Venezuela
0.06
μεγά
0.06
Blowjob
0.06
ceased
0.06
Activations Density 0.023%