INDEX
Explanations
tweets from Donald J. Trump
references to Donald J. Trump
New Auto-Interp
Head Attr Weights
0:0.05
1:0.03
2:0.14
3:0.08
4:0.26
5:0.11
6:0.03
7:0.02
8:0.05
9:0.12
10:0.06
11:0.02
Negative Logits
inarily
-1.53
metadata
-1.44
imeters
-1.38
itored
-1.27
NK
-1.25
Riy
-1.23
dylib
-1.22
ranean
-1.21
Rated
-1.20
minecraft
-1.19
POSITIVE LOGITS
teness
1.34
lander
1.33
SEA
1.31
Cole
1.31
expend
1.28
eas
1.27
�
1.25
hemer
1.19
letter
1.15
��
1.14
Activations Density 0.005%