INDEX
Explanations
job titles followed by females
New Auto-Interp
Negative Logits
represents
0.44
musician
0.41
manufacturer
0.40
पार्षद
0.40
hiker
0.40
<0x0D>
0.39
significantly
0.39
physik
0.38
geologist
0.38
innebär
0.37
POSITIVE LOGITS
lady
0.55
اسمها
0.49
是个
0.48
сказала
0.48
вона
0.46
ఆమె
0.46
她说
0.45
היא
0.44
она
0.43
Lady
0.42
Activations Density 0.009%