INDEX
Explanations
outcomes
This neuron activates on numerical tokens representing decimal fractions (e.g., probabilities) in the text.
New Auto-Interp
Negative Logits
римін
-0.06
authors
-0.06
/text
-0.06
fig
-0.06
ertype
-0.06
trot
-0.06
toilets
-0.06
interrupt
-0.06
_md
-0.06
ιά
-0.06
POSITIVE LOGITS
раніше
0.07
dün
0.07
蘇
0.07
stove
0.06
Comparison
0.06
_cu
0.06
therm
0.06
requ
0.06
넷
0.06
0.06
Activations Density 0.008%