INDEX
Explanations
The neuron activates on occurrences of the verb “want” (and nearby “to”)—i.e. expressions of desire or intent.
New Auto-Interp
Negative Logits
因
-0.06
toddlers
-0.06
Carn
-0.06
semicolon
-0.06
_nd
-0.06
_sid
-0.06
Id
-0.06
provoke
-0.06
—from
-0.06
contributes
-0.06
POSITIVE LOGITS
interested
0.07
atitude
0.06
izzly
0.06
keen
0.06
oslav
0.06
متف
0.06
prince
0.06
.white
0.06
llib
0.06
ORMAT
0.06
Activations Density 0.020%