INDEX
Explanations
connections to political figures and their actions
New Auto-Interp
Head Attr Weights
0:0.15
1:0.02
2:0.16
3:0.06
4:0.03
5:0.08
6:0.07
7:0.08
8:0.18
9:0.03
10:0.02
11:0.06
Negative Logits
nect
-2.95
̶
-2.88
lyn
-2.76
baskets
-2.75
Ashe
-2.75
Node
-2.72
�
-2.71
npm
-2.68
annie
-2.67
boarded
-2.67
POSITIVE LOGITS
Schwarzenegger
7.66
enegger
7.08
Terminator
4.48
Arnold
4.15
Kung
3.48
Expend
3.38
Fiorina
3.32
Godzilla
3.32
Diet
3.29
Celeb
3.20
Activations Density 0.002%