INDEX
Explanations
mentions of specific individuals related to social and political issues, particularly in the context of abuse and injustice
New Auto-Interp
Negative Logits
jid
-0.16
stoup
-0.16
одеÑĢж
-0.15
947
-0.15
_mentions
-0.15
COPYING
-0.15
undi
-0.15
éĢ
-0.15
ccione
-0.14
úng
-0.14
POSITIVE LOGITS
ondo
0.16
utto
0.14
Decompiled
0.14
ests
0.14
.sb
0.14
itsu
0.14
ions
0.13
ion
0.13
Bom
0.13
Undert
0.13
Activations Density 0.010%