INDEX
Explanations
references to political figures and media coverage of their actions
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.06
3:0.17
4:0.31
5:0.02
6:0.02
7:0.10
8:0.05
9:0.05
10:0.08
11:0.06
Negative Logits
Pis
-1.52
Malk
-1.52
ipel
-1.47
shorthand
-1.45
Syl
-1.43
oenix
-1.41
prest
-1.38
Lomb
-1.37
ainment
-1.35
versible
-1.34
POSITIVE LOGITS
shortcomings
1.98
inaction
1.96
bullies
1.82
advances
1.75
perceived
1.69
failings
1.68
seeming
1.67
inconsistencies
1.63
intruder
1.63
aggress
1.62
Activations Density 0.173%