INDEX
Explanations
references to uprisings and rebellions against oppressive regimes
New Auto-Interp
Negative Logits
dater
-0.19
orne
-0.17
ahoma
-0.17
oux
-0.15
nds
-0.15
Boone
-0.15
tru
-0.15
arch
-0.15
Dispose
-0.14
Reese
-0.14
POSITIVE LOGITS
Against
0.19
against
0.18
avirus
0.15
_again
0.14
)const
0.14
/self
0.13
Ñģамов
0.13
kie
0.13
exc
0.13
Thomas
0.13
Activations Density 0.154%