INDEX
Explanations
references to "The New York Times" in the text
New Auto-Interp
Head Attr Weights
0:0.24
1:0.03
2:0.02
3:0.04
4:0.17
5:0.17
6:0.05
7:0.01
8:0.16
9:0.04
10:0.00
11:0.01
Negative Logits
loophole
-2.15
filibuster
-1.94
assumption
-1.92
fence
-1.91
shut
-1.83
closed
-1.75
veil
-1.74
ouf
-1.72
reservation
-1.72
threshold
-1.70
POSITIVE LOGITS
podcast
2.26
Dispatch
2.21
Trend
1.91
Cola
1.89
uters
1.86
Daily
1.82
icter
1.78
Week
1.77
1.75
Kind
1.75
Activations Density 0.002%