INDEX
Explanations
concepts related to modern ideologies and societal issues
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.35
3:0.09
4:0.07
5:0.05
6:0.06
7:0.03
8:0.04
9:0.06
10:0.09
11:0.07
Negative Logits
sidx
-1.89
thereof
-1.46
thereto
-1.40
afore
-1.34
ヴァ
-1.32
laun
-1.31
liest
-1.29
assad
-1.27
zl
-1.24
elfth
-1.22
POSITIVE LOGITS
abound
2.03
ⓘ
1.58
¶
1.49
geries
1.39
meanwhile
1.36
sprang
1.35
indeed
1.33
Redux
1.32
rife
1.32
cropped
1.32
Activations Density 0.718%