INDEX
Explanations
exclamatory phrases and expressions of surprise or emphasis
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.06
3:0.08
4:0.16
5:0.04
6:0.06
7:0.25
8:0.05
9:0.05
10:0.07
11:0.10
Negative Logits
ワン
-1.73
pert
-1.46
unte
-1.36
vere
-1.31
ACTION
-1.30
Orig
-1.29
andom
-1.28
essor
-1.27
actionGroup
-1.27
fle
-1.26
POSITIVE LOGITS
LOS
1.51
gaps
1.45
ija
1.45
trickle
1.44
glaring
1.43
heavens
1.36
lasses
1.35
anwhile
1.32
backwards
1.28
footsteps
1.27
Activations Density 0.001%