INDEX
Explanations
references to prominent organizations, entities, and events
New Auto-Interp
Head Attr Weights
0:0.10
1:0.21
2:0.04
3:0.04
4:0.04
5:0.23
6:0.03
7:0.02
8:0.07
9:0.06
10:0.05
11:0.05
Negative Logits
lement
-2.13
alle
-1.90
ーティ
-1.83
ances
-1.82
omething
-1.80
ily
-1.79
andra
-1.78
lem
-1.78
atson
-1.76
alam
-1.76
POSITIVE LOGITS
PRES
2.16
publishes
2.05
inaug
1.99
Publishing
1.90
PRESS
1.89
Presents
1.89
coined
1.88
Masquerade
1.83
organising
1.77
Pub
1.77
Activations Density 0.003%