INDEX
Explanations
The neuron is triggered by the token “up” (and its occurrences as a standalone word or in “up-” prefixed words).
New Auto-Interp
Negative Logits
Prote
-0.08
Nom
-0.07
Synthetic
-0.07
sterile
-0.07
sen
-0.07
errno
-0.07
Karen
-0.07
故
-0.06
ゾ
-0.06
filename
-0.06
POSITIVE LOGITS
(up
0.07
(update
0.07
Up
0.07
according
0.07
upd
0.07
up
0.07
fino
0.07
increased
0.07
,只
0.06
[+
0.06
Activations Density 0.021%