INDEX
Explanations
calls to action or prompts to click for more information
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.04
3:0.08
4:0.07
5:0.03
6:0.32
7:0.08
8:0.03
9:0.05
10:0.08
11:0.11
Negative Logits
opausal
-1.28
purposes
-1.16
inguished
-1.16
ufact
-1.12
icable
-1.10
representation
-1.10
comprise
-1.08
sole
-1.08
clerosis
-1.07
duties
-1.07
POSITIVE LOGITS
Refresh
1.51
bye
1.29
��
1.24
comma
1.20
sidx
1.19
dice
1.18
witz
1.17
Else
1.16
Rebels
1.13
Quotes
1.12
Activations Density 0.007%