INDEX
Explanations
This neuron flags polite first-person desire or preference constructions, especially phrases like “I would like to ….”
New Auto-Interp
Negative Logits
kırmızı
-0.06
_axes
-0.06
ivering
-0.06
нения
-0.06
Osama
-0.06
Ethernet
-0.06
výkon
-0.06
vious
-0.06
등학교
-0.06
_fc
-0.06
POSITIVE LOGITS
ung
0.07
Rip
0.07
value
0.06
indic
0.06
Traditional
0.06
tr
0.06
clinic
0.06
ellery
0.06
cessive
0.06
ierrez
0.06
Activations Density 0.013%