INDEX
Explanations
first-person
This neuron responds to first-person references (especially the pronoun “I”).
New Auto-Interp
Negative Logits
mour
-0.07
diagnostic
-0.06
"";↵
-0.06
安
-0.06
tower
-0.06
%"↵
-0.06
↵↵
-0.06
outrage
-0.06
況
-0.06
Une
-0.06
POSITIVE LOGITS
SOLUTION
0.07
liž
0.07
tenemos
0.06
scrolled
0.06
`),↵
0.06
fgets
0.06
onesia
0.06
ürlich
0.06
Ao
0.06
asured
0.06
Activations Density 0.120%