INDEX
Explanations
expressions of negativity or dissatisfaction
New Auto-Interp
Negative Logits
unden
-0.16
Really
-0.16
дейÑģÑĤвиÑĤелÑĮно
-0.16
Really
-0.15
ullo
-0.15
ÙĨسب
-0.15
almost
-0.15
ninger
-0.15
almost
-0.14
undan
-0.14
POSITIVE LOGITS
flattering
0.22
ideal
0.20
conducive
0.20
pleasant
0.19
thrilled
0.19
appet
0.19
kosher
0.19
glamorous
0.19
optimal
0.19
savory
0.19
Activations Density 0.114%