INDEX
Explanations
expressions of positive sentiment or affection towards people, objects, or experiences
New Auto-Interp
Negative Logits
stav
-0.16
elles
-0.15
apur
-0.15
-uri
-0.14
DY
-0.14
ĵ¨
-0.14
arkan
-0.14
IEW
-0.14
rices
-0.14
.dy
-0.13
POSITIVE LOGITS
asha
0.18
ester
0.16
able
0.15
overall
0.15
olio
0.15
olt
0.15
olie
0.14
-Ñģ
0.14
acker
0.14
DET
0.14
Activations Density 0.067%