INDEX
Explanations
names of political figures and notable individuals
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.22
3:0.07
4:0.07
5:0.03
6:0.03
7:0.05
8:0.07
9:0.03
10:0.17
11:0.19
Negative Logits
affirmation
-1.30
oret
-1.23
salute
-1.18
reminder
-1.17
undown
-1.16
delet
-1.16
constitu
-1.15
lot
-1.13
didnt
-1.13
reminds
-1.12
POSITIVE LOGITS
whom
1.39
[/
1.38
SPONSORED
1.27
Weather
1.22
cific
1.19
Sheldon
1.19
invading
1.15
Mutual
1.13
phy
1.12
helicop
1.11
Activations Density 0.289%