INDEX
Explanations
phrases related to whistleblowing or revealing secret information
instances of whistleblowing and references to deadlines in various contexts
New Auto-Interp
Head Attr Weights
0:0.09
1:0.03
2:0.05
3:0.10
4:0.03
5:0.10
6:0.03
7:0.02
8:0.28
9:0.13
10:0.05
11:0.03
Negative Logits
Statement
-1.24
cci
-1.15
SPONSORED
-1.06
coun
-1.04
cially
-1.03
TABLE
-1.03
tion
-0.99
KNOWN
-0.97
seq
-0.97
assadors
-0.95
POSITIVE LOGITS
lid
1.33
eyeb
1.25
geist
1.24
inis
1.08
:(
1.07
erella
1.06
whistle
1.05
andel
1.05
emale
1.05
dirt
1.04
Activations Density 0.012%