INDEX
Explanations
phrases or concepts related to cultural and historical comparisons
New Auto-Interp
Negative Logits
ngo
-0.18
bish
-0.16
EMALE
-0.16
agged
-0.16
crawler
-0.15
ManagerInterface
-0.14
Liberia
-0.14
Gül
-0.14
жен
-0.14
africa
-0.14
POSITIVE LOGITS
Americans
0.42
Indians
0.35
Russians
0.34
Americans
0.34
Germans
0.34
Australians
0.33
Canadians
0.33
Texans
0.32
Koreans
0.31
Israelis
0.31
Activations Density 0.579%