INDEX
Explanations
expressions of preference or affection
Comes before opinions or preferences
expressing liking
New Auto-Interp
Negative Logits
endwhile
-0.49
بوابة
-0.48
<bos>
-0.47
breathtaking
-0.45
devastating
-0.45
breakthroughs
-0.43
سكانية
-0.42
Burnham
-0.42
^
-0.41
lastonbury
-0.41
POSITIVE LOGITS
Dislikes
0.67
InputDecoration
0.66
liked
0.64
Liked
0.61
Liked
0.60
dislike
0.59
liked
0.59
Likes
0.56
Likes
0.54
gusta
0.54
Activations Density 0.064%