INDEX
Explanations
phrases related to disappointment and dissatisfaction
New Auto-Interp
Negative Logits
ena
-0.15
“
-0.14
::↵
-0.13
thanks
-0.13
kop
-0.13
:↵
-0.13
Indeed
-0.13
ifr
-0.12
below
-0.12
ëį°ìĿ´íĬ¸
-0.12
POSITIVE LOGITS
[
0.24
[s
0.23
[$
0.23
[,]
0.18
[in
0.18
['
0.17
[to
0.17
[<
0.17
[_
0.17
[`
0.17
Activations Density 2.423%