INDEX
Explanations
The neuron detects the prefix “Pre” at the start of words (i.e., tokens beginning with “Pre-”).
New Auto-Interp
Negative Logits
Elliot
-0.08
Solomon
-0.07
221
-0.07
Lomb
-0.07
jom
-0.07
Donovan
-0.07
oit
-0.07
олит
-0.07
olson
-0.07
Joan
-0.07
POSITIVE LOGITS
pre
0.13
Pre
0.12
Pre
0.12
.Pre
0.11
.pre
0.11
_pre
0.11
/pre
0.09
PRE
0.09
_Pre
0.09
de
0.09
Activations Density 0.046%