INDEX
Explanations
references to news articles and sources related to reporting or ongoing stories
New Auto-Interp
Head Attr Weights
0:0.05
1:0.10
2:0.03
3:0.02
4:0.06
5:0.14
6:0.05
7:0.03
8:0.04
9:0.34
10:0.04
11:0.04
Negative Logits
olls
-1.53
atever
-1.50
¯¯
-1.48
axis
-1.47
arters
-1.46
ilts
-1.45
cially
-1.44
bably
-1.44
adra
-1.44
erella
-1.42
POSITIVE LOGITS
Replay
1.60
Updated
1.45
Telegram
1.42
Quotes
1.42
emoji
1.42
Deng
1.38
DVDs
1.36
OECD
1.36
newsletters
1.36
anu
1.34
Activations Density 0.036%