INDEX
Explanations
expressions of willingness to help and provide assistance
New Auto-Interp
Negative Logits
pest
-0.20
.adapters
-0.17
ptest
-0.17
(CG
-0.16
ogui
-0.15
nist
-0.15
ãĥ¬ãĤ¹
-0.15
ingo
-0.14
oldem
-0.14
embali
-0.14
POSITIVE LOGITS
pleasure
0.32
happy
0.28
happiness
0.27
Glad
0.26
Happy
0.26
happy
0.24
Happy
0.23
glad
0.23
joy
0.22
willingness
0.22
Activations Density 0.137%