INDEX
Explanations
words related to specific nationalities
references to demographic groups and their sentiments or actions
New Auto-Interp
Negative Logits
advertisement
-0.73
Subtle
-0.67
ĸļ
-0.66
LOD
-0.64
Dur
-0.64
Impro
-0.64
Engineers
-0.64
forcement
-0.63
CRC
-0.62
Prov
-0.61
POSITIVE LOGITS
resent
1.45
distrust
1.38
sympath
1.35
despise
1.32
mistrust
1.31
disapprove
1.29
dislike
1.25
oppose
1.25
fear
1.22
abhor
1.21
Activations Density 0.315%