INDEX
Explanations
prepositions
The neuron primarily activates on numeric tokens representing decimal or fractional values (e.g. tokens beginning with “.”).
New Auto-Interp
Negative Logits
Numbers
-0.07
choked
-0.07
zd
-0.06
-show
-0.06
مش
-0.06
Lawyer
-0.06
Mu
-0.06
arnings
-0.06
MB
-0.06
toes
-0.06
POSITIVE LOGITS
систему
0.08
"'
0.06
встанов
0.06
ICU
0.06
rig
0.06
ュー
0.06
_friends
0.06
Coh
0.06
Expedition
0.06
::::::::::::::::::::::::::::::::
0.06
Activations Density 0.088%