INDEX
Explanations
This neuron primarily detects occurrences of the standalone token “New” (as in place names like “New York” or “New Jersey”).
New Auto-Interp
Negative Logits
veh
-0.07
Democr
-0.07
fcn
-0.07
educ
-0.07
/lic
-0.07
�
-0.06
доч
-0.06
rane
-0.06
申
-0.06
fc
-0.06
POSITIVE LOGITS
New
0.09
numerical
0.07
PET
0.07
problem
0.07
postav
0.07
too
0.06
estate
0.06
_per
0.06
ome
0.06
getir
0.06
Activations Density 0.033%