INDEX
Explanations
This neuron detects occurrences of the word “sign” (and its plural “signs”) used to indicate an indicator or cue.
New Auto-Interp
Negative Logits
_sold
-0.08
imd
-0.07
rights
-0.07
sortable
-0.07
Tên
-0.06
acc
-0.06
ستم
-0.06
Eduardo
-0.06
-aged
-0.06
wed
-0.06
POSITIVE LOGITS
ostream
0.07
db
0.07
(distance
0.06
_CUR
0.06
zurück
0.06
469
0.06
保证
0.06
vezes
0.06
ode
0.06
�
0.06
Activations Density 0.004%