INDEX
Explanations
presence of news-related events and discussions
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.11
3:0.06
4:0.19
5:0.03
6:0.14
7:0.08
8:0.04
9:0.05
10:0.09
11:0.10
Negative Logits
killers
-1.32
roo
-1.22
sinners
-1.22
bodily
-1.21
accompl
-1.17
disliked
-1.16
ococ
-1.15
killer
-1.14
tan
-1.13
flora
-1.13
POSITIVE LOGITS
Editorial
1.40
outset
1.30
Synd
1.29
═
1.28
point
1.27
December
1.26
editorial
1.25
cture
1.22
ORN
1.21
verse
1.18
Activations Density 0.006%