INDEX
Explanations
expressions of surprise, ability, and emotions related to happiness
New Auto-Interp
Head Attr Weights
0:0.03
1:0.04
2:0.41
3:0.04
4:0.01
5:0.03
6:0.04
7:0.06
8:0.11
9:0.10
10:0.05
11:0.05
Negative Logits
Directive
-1.20
plague
-1.16
precaution
-1.14
Lum
-1.13
Internal
-1.13
rarely
-1.12
tradition
-1.11
bount
-1.10
selection
-1.10
characteristic
-1.09
POSITIVE LOGITS
angered
1.68
someday
1.64
icent
1.58
ワン
1.57
icted
1.55
stories
1.52
itialized
1.50
atre
1.46
ギ
1.43
orthy
1.42
Activations Density 0.174%