INDEX
Explanations
expressions of preference or enjoyment
New Auto-Interp
Negative Logits
Hernandez
-0.75
Asher
-0.71
èvre
-0.66
اهرة
-0.63
Fernández
-0.62
brazos
-0.62
Gem
-0.62
<h6>
-0.62
PathVariable
-0.61
凸
-0.60
POSITIVE LOGITS
Likes
1.13
liked
1.12
Liked
1.05
liking
1.04
likes
1.03
Likes
1.01
Lik
0.97
likes
0.96
dislike
0.95
liked
0.90
Activations Density 0.056%