INDEX
Explanations
the word "rebel" at various instances
references to rebel groups and their activities
New Auto-Interp
Negative Logits
ographies
-0.85
thora
-0.82
gow
-0.79
orgetown
-0.78
Robo
-0.77
Safety
-0.75
Atmospheric
-0.75
Arthur
-0.75
ostics
-0.74
Compton
-0.72
POSITIVE LOGITS
rebels
1.08
factions
1.06
milit
1.05
rebel
1.03
militias
0.98
shelling
0.95
faction
0.94
uprising
0.92
army
0.90
regime
0.89
Activations Density 0.017%