INDEX
Explanations
references to the Washington Post and its reporting
New Auto-Interp
Head Attr Weights
0:0.04
1:0.06
2:0.11
3:0.07
4:0.04
5:0.11
6:0.04
7:0.03
8:0.04
9:0.30
10:0.07
11:0.04
Negative Logits
onga
-1.58
neighbour
-1.41
neighbours
-1.37
uni
-1.30
gart
-1.28
ranged
-1.24
route
-1.23
rouse
-1.21
ranch
-1.20
range
-1.19
POSITIVE LOGITS
Editorial
1.58
kefeller
1.50
��
1.37
Fact
1.35
ptroller
1.32
Watergate
1.30
�
1.29
Tyrann
1.28
leases
1.28
Pigs
1.28
Activations Density 0.025%