INDEX
Explanations
The neuron responds to feminine third-person references (e.g. “she,” “her,” “girl”).
New Auto-Interp
Negative Logits
Independent
-0.07
aga
-0.07
_Window
-0.07
肃
-0.07
-0.07
울
-0.06
áp
-0.06
Income
-0.06
BASH
-0.06
rio
-0.06
POSITIVE LOGITS
Lieutenant
0.06
peril
0.06
Mein
0.06
ีผ
0.06
Sergeant
0.06
nướng
0.06
EntryPoint
0.06
خوب
0.06
ofrec
0.06
plaintiffs
0.06
Activations Density 0.120%