INDEX
Explanations
accuracy
This neuron activates on words that signal formal mathematical rigor or exactness (e.g. “exact,” “rigorous,” “certified”).
New Auto-Interp
Negative Logits
("\"-0.07
_black
-0.07
mua
-0.07
.Permission
-0.06
munition
-0.06
iece
-0.06
summaries
-0.06
test
-0.06
zero
-0.06
sin
-0.06
POSITIVE LOGITS
iếng
0.07
Salvador
0.07
บาง
0.06
aptor
0.06
Hull
0.06
amm
0.06
&,
0.06
ισχ
0.06
ΩΝ
0.06
EIF
0.06
Activations Density 0.032%