INDEX
Explanations
The neuron detects occurrences of the token “worm” (in both singular and plural forms).
New Auto-Interp
Negative Logits
fluffy
-0.08
intuit
-0.08
Senate
-0.07
、高
-0.07
etal
-0.06
Joe
-0.06
insanity
-0.06
кле
-0.06
_safe
-0.06
Symphony
-0.06
POSITIVE LOGITS
worms
0.13
worm
0.12
worm
0.11
Worm
0.10
�
0.09
-vesm
0.09
m
0.08
(cm
0.08
term
0.07
(tm
0.07
Activations Density 0.002%