INDEX
Explanations
references to political entities and events
New Auto-Interp
Head Attr Weights
0:0.05
1:0.13
2:0.05
3:0.04
4:0.04
5:0.04
6:0.17
7:0.04
8:0.04
9:0.28
10:0.02
11:0.04
Negative Logits
Bie
-3.48
Kafka
-3.47
pire
-3.37
rapp
-3.31
Mara
-3.27
raft
-3.26
peer
-3.26
Sloven
-3.22
Jar
-3.19
rak
-3.17
POSITIVE LOGITS
General
6.24
General
5.84
general
5.27
GENERAL
5.13
GM
4.62
general
4.39
generals
4.10
GE
4.03
GM
4.03
GC
3.96
Activations Density 0.006%