INDEX
Explanations
references to historical events and figures
New Auto-Interp
Head Attr Weights
0:0.02
1:0.16
2:0.10
3:0.03
4:0.02
5:0.05
6:0.09
7:0.06
8:0.09
9:0.08
10:0.06
11:0.20
Negative Logits
arters
-1.63
Patreon
-1.28
orable
-1.24
ASAP
-1.21
Chair
-1.08
��
-1.04
Advertisement
-1.03
preferably
-1.03
Artificial
-1.02
arial
-1.01
POSITIVE LOGITS
}}
1.24
vanished
1.23
perceive
1.19
eff
1.17
ynt
1.15
yll
1.11
ois
1.11
ect
1.10
perceived
1.10
perce
1.09
Activations Density 0.027%