INDEX
Explanations
expressions of gratitude and appreciation
New Auto-Interp
Negative Logits
ascript
-0.17
oir
-0.16
ãng
-0.15
hte
-0.15
rror
-0.14
stav
-0.14
ãĥªãĤ¢
-0.14
eteor
-0.14
viso
-0.14
addir
-0.14
POSITIVE LOGITS
very
0.43
very
0.35
muito
0.34
VERY
0.33
sehr
0.33
Very
0.33
Very
0.32
bardzo
0.31
molto
0.30
VERY
0.28
Activations Density 0.129%