INDEX
Explanations
Usage of specific strategies or tactics in political contexts
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.07
3:0.04
4:0.10
5:0.01
6:0.03
7:0.42
8:0.02
9:0.02
10:0.12
11:0.09
Negative Logits
elapsed
-1.33
cies
-1.32
Jere
-1.29
Simone
-1.29
abytes
-1.29
notes
-1.26
atural
-1.24
sensations
-1.24
ceilings
-1.24
worthy
-1.21
POSITIVE LOGITS
guerrilla
1.70
sabotage
1.69
avoidance
1.68
ambush
1.56
infiltration
1.51
strategy
1.48
camouflage
1.48
engagement
1.45
attack
1.45
ategy
1.39
Activations Density 0.019%