INDEX
Explanations
cultural or ethnic groups and associated community events or activities
references to various ethnic and cultural identities
New Auto-Interp
Negative Logits
xon
-0.77
upiter
-0.74
rontal
-0.71
EH
-0.68
mble
-0.66
ufact
-0.65
thumbnails
-0.65
guiActiveUnfocused
-0.64
NRS
-0.64
FedEx
-0.64
POSITIVE LOGITS
advocates
0.80
separatists
0.78
immigrants
0.77
activists
0.77
stereotypes
0.75
voices
0.75
supremacists
0.73
actors
0.73
feminists
0.72
bashing
0.71
Activations Density 0.343%