INDEX
Explanations
mentions of social justice or reparations related to historical injustices
New Auto-Interp
Negative Logits
izoph
-0.18
yere
-0.15
hari
-0.14
nod
-0.14
egas
-0.14
пÑĢоиз
-0.14
odcast
-0.14
å¹³æĪIJ
-0.14
è«ĩ
-0.13
islav
-0.13
POSITIVE LOGITS
online
0.23
0.20
posts
0.19
reactions
0.19
overnight
0.19
twe
0.17
community
0.17
Online
0.17
Twe
0.16
reaction
0.16
Activations Density 0.121%