INDEX
Explanations
This neuron activates on tokens ending in “ist,” i.e. the suffix marking professional or specialist titles.
New Auto-Interp
Negative Logits
Freedom
-0.07
Innoc
-0.07
unre
-0.06
erro
-0.06
(peer
-0.06
"></
-0.06
ocoa
-0.06
WEB
-0.06
Kara
-0.06
re
-0.06
POSITIVE LOGITS
ist
0.14
ists
0.12
IST
0.12
therapist
0.09
specialist
0.09
ologist
0.09
list
0.09
mist
0.09
rist
0.09
isz
0.09
Activations Density 0.032%