INDEX
Explanations
expressions of disappointment or surprise
New Auto-Interp
Head Attr Weights
0:0.08
1:0.13
2:0.03
3:0.08
4:0.03
5:0.15
6:0.05
7:0.01
8:0.13
9:0.11
10:0.08
11:0.06
Negative Logits
20439
-1.62
climate
-1.50
onom
-1.44
Surviv
-1.42
urst
-1.39
assis
-1.36
issance
-1.35
ullivan
-1.34
ktop
-1.32
onut
-1.32
POSITIVE LOGITS
Tags
1.27
…"
1.24
circled
1.23
knots
1.22
�
1.17
0004
1.16
XXX
1.14
Poker
1.14
POW
1.14
hanged
1.13
Activations Density 0.030%