INDEX
Explanations
phrases indicating comparison or difficulty in achieving tasks
New Auto-Interp
Head Attr Weights
0:0.11
1:0.03
2:0.01
3:0.16
4:0.13
5:0.04
6:0.06
7:0.05
8:0.23
9:0.05
10:0.03
11:0.05
Negative Logits
weeney
-2.29
actionGroup
-2.22
eson
-2.12
ureen
-2.10
bol
-2.09
entin
-2.07
letters
-2.04
nesday
-2.03
milo
-1.95
Dispatch
-1.87
POSITIVE LOGITS
unheard
2.01
forgiven
1.92
thanks
1.89
appreciated
1.87
.",
1.81
profitable
1.81
!".
1.81
certs
1.77
!",
1.73
true
1.72
Activations Density 0.001%