INDEX
Explanations
automatically
The neuron flags adverbs (especially ones ending in “-ly”).
New Auto-Interp
Negative Logits
Dar
-0.07
sin
-0.07
||
-0.06
679
-0.06
lí
-0.06
sd
-0.06
III
-0.06
kü
-0.06
cliff
-0.06
.cl
-0.05
POSITIVE LOGITS
automatically
0.14
autom
0.09
autom
0.09
автом
0.08
Headquarters
0.08
Autom
0.08
ocom
0.08
automáticamente
0.07
OPT
0.07
汽车
0.07
Activations Density 0.008%