INDEX
Explanations
names and descriptors related to personal relationships and physical appearances
New Auto-Interp
Negative Logits
vrouwen
-0.50
women
-0.47
WOMEN
-0.46
resourceCulture
-0.45
Women
-0.45
Frauen
-0.44
women
-0.44
WOMEN
-0.43
женщин
-0.42
kvinnor
-0.41
POSITIVE LOGITS
tempt
0.69
beautiful
0.66
bim
0.65
pretty
0.63
siren
0.62
beauty
0.61
sultry
0.59
bombs
0.58
blonde
0.57
prettiest
0.57
Activations Density 0.364%