INDEX
Explanations
contexts related to political positions and governmental roles
New Auto-Interp
Head Attr Weights
0:0.04
1:0.04
2:0.41
3:0.03
4:0.03
5:0.02
6:0.05
7:0.05
8:0.03
9:0.03
10:0.16
11:0.05
Negative Logits
GOODMAN
-2.77
iosyn
-2.58
simul
-2.43
plaza
-2.40
<@
-2.27
awa
-2.22
poppy
-2.16
ベ
-2.14
autom
-2.13
milo
-2.10
POSITIVE LOGITS
d
3.79
D
3.72
ds
3.52
Ds
3.39
DD
3.33
DF
3.32
DERR
3.21
dd
3.11
dal
3.10
DL
3.04
Activations Density 0.018%