INDEX
Explanations
punctuation
This neuron never activates—it does not respond to any token.
New Auto-Interp
Negative Logits
compares
-0.07
ايران
-0.07
allocations
-0.07
ाच
-0.07
Latitude
-0.06
configur
-0.06
Resources
-0.06
елов
-0.06
.tech
-0.06
")↵
-0.06
POSITIVE LOGITS
екотор
0.07
Lebens
0.06
done
0.06
pylab
0.06
-flag
0.06
licht
0.06
Dio
0.06
-HT
0.06
bel
0.06
ungen
0.06
Activations Density 0.021%