INDEX
Explanations
gender-specific terms, predominantly focusing on the differentiation between male and female
references to gender, particularly distinguishing between male and female
New Auto-Interp
Negative Logits
mun
-0.73
imony
-0.64
Grassley
-0.63
76561
-0.62
Jub
-0.61
Warn
-0.60
mat
-0.60
jen
-0.59
&
-0.57
itle
-0.57
POSITIVE LOGITS
versa
0.85
alike
0.84
combatants
0.82
halves
0.74
senal
0.72
coasts
0.70
aiman
0.68
sides
0.67
peat
0.66
respectively
0.66
Activations Density 0.189%