INDEX
Explanations
sentences indicating official announcements or statements
New Auto-Interp
Head Attr Weights
0:0.08
1:0.03
2:0.04
3:0.04
4:0.06
5:0.03
6:0.18
7:0.04
8:0.09
9:0.28
10:0.02
11:0.03
Negative Logits
Roz
-4.11
cryptoc
-4.04
Pol
-3.99
Warsaw
-3.71
ł
-3.69
catentry
-3.62
ulz
-3.60
Poland
-3.56
[&
-3.51
Activision
-3.45
POSITIVE LOGITS
Henderson
10.56
Hend
7.67
hend
5.60
Hendricks
5.13
Edgar
4.71
Petersen
4.67
Hem
4.39
Kend
4.12
Hume
4.06
horn
4.01
Activations Density 0.002%