INDEX
Explanations
references to American identity and sociopolitical issues
New Auto-Interp
Negative Logits
reform
-0.17
ulen
-0.15
avin
-0.15
ep
-0.15
oblin
-0.14
spins
-0.14
å½Ĵ
-0.14
Weg
-0.14
ier
-0.13
ila
-0.13
POSITIVE LOGITS
American
0.32
American
0.28
Americans
0.27
America
0.26
merican
0.25
american
0.23
USA
0.23
ãĤ¢ãĥ¡ãĥªãĤ«
0.23
-American
0.23
USA
0.22
Activations Density 0.189%