INDEX
Explanations
News articles
The neuron fires on tokens that introduce or label an individual actor or role—words like “agent,” “manager,” “player,” or the relative pronoun “who” marking a person doing something.
New Auto-Interp
Negative Logits
Imp
-0.07
_ext
-0.06
الأمر
-0.06
ekonomik
-0.06
038
-0.06
-Am
-0.06
SUP
-0.06
_dx
-0.06
Normally
-0.06
makes
-0.06
POSITIVE LOGITS
buttonShape
0.07
puted
0.06
пе
0.06
(sk
0.06
(Screen
0.06
(primary
0.06
club
0.06
.field
0.06
------------
0.06
¥
0.06
Activations Density 0.381%