INDEX
Explanations
prepositions
This neuron detects words that express personal interest, intent, or motivational stance (e.g. “interested,” “forced,” “stating,” “suggested,” “driven”).
New Auto-Interp
Negative Logits
want
-0.07
Metro
-0.06
fa
-0.06
xp
-0.06
BUG
-0.06
-insert
-0.06
Northern
-0.06
“It
-0.06
ennial
-0.05
wanted
-0.05
POSITIVE LOGITS
ALSO
0.07
TODAY
0.06
AM
0.06
contraction
0.06
chalk
0.06
coordinating
0.06
_owner
0.06
,每
0.06
_UNITS
0.06
(parameter
0.06
Activations Density 0.261%