INDEX
Explanations
This neuron detects occurrences of the word “gender” (often as a heading or key topic label).
New Auto-Interp
Negative Logits
pies
-0.08
-line
-0.08
Note
-0.07
it
-0.07
ioc
-0.07
Acts
-0.07
onc
-0.06
ooks
-0.06
Streamer
-0.06
at
-0.06
POSITIVE LOGITS
gender
0.16
Gender
0.14
Gender
0.14
gender
0.09
_gender
0.09
Geld
0.09
transgender
0.09
genders
0.08
geld
0.08
sex
0.08
Activations Density 0.007%