INDEX
Explanations
phrases related to narratives and portrayals of events or individuals
references to portrayals and narratives relating to individuals or groups
New Auto-Interp
Negative Logits
iland
-0.73
awoken
-0.71
resid
-0.69
ktop
-0.68
HOU
-0.65
clocks
-0.64
cyl
-0.63
resides
-0.62
drafts
-0.62
sterdam
-0.62
POSITIVE LOGITS
blame
1.00
disrespect
0.91
discredit
0.90
hypocrisy
0.89
scapego
0.87
culp
0.87
å§
0.86
unfairly
0.83
racist
0.83
innocence
0.83
Activations Density 0.630%