INDEX
Explanations
references to female athletes and their achievements
New Auto-Interp
Negative Logits
僕は
-0.52
boy
-0.51
boys
-0.50
boy
-0.50
łem
-0.47
僕も
-0.47
僕
-0.45
ragazzi
-0.44
boys
-0.44
boyhood
-0.43
POSITIVE LOGITS
她们
0.69
她們
0.65
AssemblyCompany
0.62
करती
0.57
juntas
0.57
hendes
0.55
feminism
0.54
姐妹
0.52
ovaries
0.52
feminist
0.52
Activations Density 0.286%