INDEX
Explanations
references to societal and political issues related to race and identity
New Auto-Interp
Negative Logits
oru
-0.15
леÑĢ
-0.14
æľĭ
-0.14
æĥij
-0.14
FER
-0.14
ulen
-0.14
anter
-0.14
зÑĮ
-0.13
Tube
-0.13
Credentials
-0.13
POSITIVE LOGITS
Jude
0.28
America
0.27
American
0.26
America
0.23
liberty
0.23
freedoms
0.23
Americans
0.23
American
0.22
greatness
0.22
Found
0.20
Activations Density 0.166%