INDEX
Explanations
phrases indicating a sense of difficulty or challenge
New Auto-Interp
Head Attr Weights
0:0.01
1:0.03
2:0.14
3:0.13
4:0.01
5:0.03
6:0.05
7:0.11
8:0.12
9:0.17
10:0.05
11:0.08
Negative Logits
overest
-1.10
introduced
-1.08
except
-1.01
introducing
-1.00
bust
-0.99
exagger
-0.98
teased
-0.97
tease
-0.96
frown
-0.94
imar
-0.94
POSITIVE LOGITS
ゼ
1.29
裏�
1.25
ット
1.23
fficiency
1.22
り
1.20
�
1.20
oplan
1.18
whereabouts
1.18
appiness
1.17
otrop
1.16
Activations Density 0.009%