INDEX
Explanations
words related to reparations or negative actions imposed on individuals
mentions of reparations or related concepts of accountability and restitution
New Auto-Interp
Negative Logits
glers
-0.94
ERY
-0.81
Abyss
-0.75
ggle
-0.73
Ducks
-0.72
Cage
-0.70
Bruins
-0.69
GER
-0.67
GER
-0.65
tuberculosis
-0.65
POSITIVE LOGITS
utations
1.46
ublic
1.24
ulsive
1.19
roach
1.14
rieve
1.14
uted
1.13
rehensible
1.13
rint
1.13
orters
1.11
utation
1.11
Activations Density 0.018%