INDEX
Explanations
The neuron strongly activates on isolated single-letter tokens (e.g. standalone variables or letters).
New Auto-Interp
Negative Logits
restaurant
-0.08
مط
-0.07
SQ
-0.07
comida
-0.07
istinguished
-0.06
/km
-0.06
アル
-0.06
ucch
-0.06
narr
-0.06
/disc
-0.06
POSITIVE LOGITS
اگر
0.06
-selection
0.06
_destroy
0.06
español
0.06
steer
0.06
an
0.06
spielen
0.06
+↵
0.06
elective
0.06
'])↵
0.06
Activations Density 0.047%