INDEX
Explanations
emotional expressions and sentiments related to enjoyment or pleasure
New Auto-Interp
Negative Logits
uyu
-0.17
bon
-0.17
agra
-0.17
bu
-0.17
plete
-0.16
501
-0.16
wi
-0.16
und
-0.15
trib
-0.15
ka
-0.15
POSITIVE LOGITS
/lo
0.19
Lo
0.19
annis
0.18
elia
0.18
iola
0.17
ngại
0.17
aic
0.17
Lo
0.17
ely
0.16
eness
0.16
Activations Density 0.011%