INDEX
Explanations
trained models
This neuron specifically detects the word “trained,” especially when it appears as part of “pre-trained.”
phrases describing the characteristics and functionalities of generative language models.
New Auto-Interp
Negative Logits
);}
-0.06
ayar
-0.06
McCoy
-0.06
invade
-0.06
_OVERRIDE
-0.06
okt
-0.06
pravděpodob
-0.06
#+#
-0.06
ringe
-0.06
-feature
-0.06
POSITIVE LOGITS
387
0.06
DBC
0.06
org
0.06
ikip
0.06
_Tag
0.06
val
0.06
량
0.06
θρώ
0.06
_len
0.06
Toronto
0.06
Activations Density 0.001%