INDEX
Explanations
words related to accusations and critiques of actions by individuals in authority positions
New Auto-Interp
Negative Logits
MpServer
-0.78
ãĥīãĥ©ãĤ´ãĥ³
-0.74
CHO
-0.66
Magikarp
-0.65
senal
-0.64
ADRA
-0.61
Issue
-0.61
WHERE
-0.59
xon
-0.59
AAF
-0.59
POSITIVE LOGITS
shire
1.13
ords
1.12
enegger
1.10
erent
1.04
ees
1.00
irms
0.98
ington
0.97
rey
0.95
yre
0.94
sburgh
0.92
Activations Density 0.009%