INDEX
Explanations
conversational language
The neuron detects phrases about media or public reaction (e.g. mentions of “media,” “outlets,” “react,” “report,” “would”).
New Auto-Interp
Negative Logits
MAP
-0.07
Lion
-0.07
все
-0.07
bowed
-0.06
nested
-0.06
ườ
-0.06
Versions
-0.06
reload
-0.06
Salon
-0.06
kara
-0.06
POSITIVE LOGITS
看
0.07
======↵
0.06
figura
0.06
(find
0.06
胡
0.06
skins
0.06
neighbours
0.06
尽管
0.06
Aus
0.06
downfall
0.06
Activations Density 0.052%