INDEX
Explanations
words related to riots and riot-related activities
New Auto-Interp
Negative Logits
DonaldTrump
-0.77
bourg
-0.74
hran
-0.73
metics
-0.72
ournal
-0.69
omething
-0.68
sonian
-0.68
ccording
-0.65
hered
-0.65
mathemat
-0.65
POSITIVE LOGITS
ous
1.00
naire
0.92
ers
0.91
ing
0.86
ously
0.86
auld
0.83
riot
0.82
rained
0.80
aries
0.78
tro
0.77
Activations Density 0.008%