INDEX
Explanations
references to female gender
New Auto-Interp
Negative Logits
lut
-0.38
ann
-0.37
Übersicht
-0.36
ery
-0.36
Rollback
-0.36
iter
-0.35
kt
-0.35
relse
-0.34
nos
-0.33
spol
-0.33
POSITIVE LOGITS
Female
1.05
Female
1.05
female
1.03
female
0.99
FEMALE
0.87
femenina
0.86
femeninos
0.86
kvinna
0.86
woman
0.85
FEMALE
0.83
Activations Density 0.169%