INDEX
Explanations
This neuron never activates—it does not detect any pattern in the text.
New Auto-Interp
Negative Logits
ğitim
-0.07
册
-0.07
_old
-0.07
Kazakhstan
-0.06
Syn
-0.06
CIT
-0.06
尿
-0.06
گن
-0.06
Syn
-0.06
742
-0.06
POSITIVE LOGITS
outdoors
0.07
degrees
0.07
SCORE
0.07
press
0.06
energy
0.06
deut
0.06
подв
0.06
color
0.06
oug
0.06
Wes
0.06
Activations Density 0.005%