INDEX
Explanations
This neuron activates on occurrences of the substring “ice,” effectively detecting the token “ice.”
New Auto-Interp
Negative Logits
Ferdinand
-0.07
Freund
-0.07
164
-0.07
comprehend
-0.07
srov
-0.07
Tal
-0.07
Fon
-0.06
transmitted
-0.06
344
-0.06
ενο
-0.06
POSITIVE LOGITS
ice
0.18
Ice
0.16
Ice
0.16
ICE
0.11
ices
0.10
ice
0.09
icy
0.09
IC
0.08
ici
0.08
iceberg
0.08
Activations Density 0.007%