INDEX
Explanations
This neuron primarily activates on first-person pronouns and possessive forms (e.g. “I,” “my,” “our”).
New Auto-Interp
Negative Logits
_PC
-0.07
Resource
-0.07
phot
-0.07
cancer
-0.07
capitalism
-0.07
взаим
-0.07
IRECTION
-0.07
-ahead
-0.07
Fully
-0.07
Send
-0.07
POSITIVE LOGITS
者
0.06
heirs
0.06
ущ
0.06
ağa
0.06
:@{0.06
={$0.06
clicked
0.06
primeira
0.06
ΩΣ
0.06
azor
0.06
Activations Density 0.152%