INDEX
Explanations
expressions of enthusiasm or positive sentiment
New Auto-Interp
Negative Logits
Otherwise
-0.18
otherwise
-0.18
Else
-0.17
ilarity
-0.17
Otherwise
-0.16
ãģĿãĤĮãģ¯
-0.15
inee
-0.15
otherwise
-0.15
Thus
-0.15
Thus
-0.14
POSITIVE LOGITS
especially
0.28
especially
0.23
pity
0.21
íĬ¹íŀĪ
0.20
even
0.20
Especially
0.19
wish
0.19
considering
0.18
wish
0.18
Wish
0.18
Activations Density 0.245%