INDEX
Explanations
references to riots or protests
New Auto-Interp
Negative Logits
bourg
-0.75
DonaldTrump
-0.71
metics
-0.67
hran
-0.66
hered
-0.66
ournal
-0.65
ULTS
-0.65
continents
-0.64
Parenthood
-0.64
NetMessage
-0.64
POSITIVE LOGITS
ous
1.02
naire
0.93
ers
0.91
ously
0.90
ing
0.88
tro
0.84
auld
0.83
aries
0.81
rained
0.79
er
0.79
Activations Density 0.083%