INDEX
Explanations
sex and gender
The neuron is looking for mentions of gender-related terms.
New Auto-Interp
Negative Logits
Frank
-0.08
usr
-0.07
Measure
-0.07
Jean
-0.07
การใช
-0.06
).*
-0.06
离开
-0.06
κατα
-0.06
by
-0.06
lên
-0.06
POSITIVE LOGITS
sqlalchemy
0.06
.handlers
0.06
ần
0.06
dissatisfaction
0.06
Orchard
0.06
νηση
0.06
reff
0.06
znam
0.06
enroll
0.06
рив
0.06
Activations Density 0.011%