INDEX
Explanations
phrases related to political comments and speeches
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.04
3:0.10
4:0.06
5:0.07
6:0.03
7:0.03
8:0.06
9:0.08
10:0.31
11:0.11
Negative Logits
extermin
-1.58
surv
-1.56
[&
-1.56
PCB
-1.55
LTD
-1.45
CRC
-1.40
MpServer
-1.40
strugg
-1.40
IPM
-1.39
upstream
-1.37
POSITIVE LOGITS
Asked
1.58
Asked
1.57
Said
1.55
Note
1.47
uphem
1.46
Kimmel
1.44
joked
1.44
CNN
1.43
Foreign
1.42
Repeat
1.37
Activations Density 0.139%