INDEX
Explanations
references to destabilization in political contexts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.06
3:0.05
4:0.13
5:0.02
6:0.03
7:0.41
8:0.02
9:0.03
10:0.08
11:0.08
Negative Logits
aldo
-1.79
れ
-1.73
ilege
-1.60
864
-1.57
ruary
-1.56
FINE
-1.55
Veterinary
-1.47
PLIED
-1.46
に
-1.46
eln
-1.45
POSITIVE LOGITS
regimes
1.64
unions
1.63
relations
1.59
spiral
1.57
weakened
1.57
structures
1.55
regions
1.54
ecosystems
1.47
alliances
1.46
region
1.43
Activations Density 0.001%