INDEX
Explanations
mention of violent crimes and their perpetrators
New Auto-Interp
Head Attr Weights
0:0.07
1:0.03
2:0.03
3:0.06
4:0.03
5:0.17
6:0.01
7:0.01
8:0.04
9:0.22
10:0.17
11:0.11
Negative Logits
domestically
-1.00
efeated
-1.00
odox
-0.98
otype
-0.94
oliberal
-0.92
izontal
-0.89
regate
-0.89
ogun
-0.88
ernels
-0.87
zsche
-0.86
POSITIVE LOGITS
panicked
1.08
dding
1.01
retali
0.99
overheard
0.96
CONTIN
0.95
replied
0.95
Later
0.93
forwarded
0.92
intercepted
0.92
guessed
0.91
Activations Density 1.977%