INDEX
Explanations
mentions of individuals, particularly journalists and reporters, in a news context
New Auto-Interp
Head Attr Weights
0:0.14
1:0.02
2:0.02
3:0.03
4:0.25
5:0.15
6:0.05
7:0.01
8:0.13
9:0.12
10:0.01
11:0.02
Negative Logits
◼
-2.12
installed
-1.96
fulfill
-1.89
playbook
-1.88
ydia
-1.87
implement
-1.84
divest
-1.82
pledge
-1.81
execute
-1.79
declare
-1.77
POSITIVE LOGITS
)—
2.21
}{2.17
.")
2.09
dinand
2.07
)"
2.05
...)
2.03
.)
2.03
})
1.90
Contribut
1.89
.)
1.89
Activations Density 0.002%