INDEX
Explanations
emotional expressions related to feelings and sentiments
New Auto-Interp
Negative Logits
figure
-0.16
unos
-0.16
itals
-0.16
../../../
-0.15
otos
-0.15
LETTE
-0.15
olas
-0.15
esian
-0.14
anje
-0.14
oler
-0.14
POSITIVE LOGITS
lessly
0.25
-good
0.22
inspace
0.21
ings
0.20
sorry
0.20
Sorry
0.19
thy
0.18
sorry
0.17
Sorry
0.16
inkel
0.16
Activations Density 0.051%