INDEX
Explanations
references to preferences or comparisons involving "like" or "likes."
the likes of
New Auto-Interp
Negative Logits
<bos>
-0.53
embatan
-0.51
ctomy
-0.50
enderal
-0.49
Towels
-0.49
Oatmeal
-0.48
Potato
-0.47
Schot
-0.47
setCellValue
-0.47
omotor
-0.47
POSITIVE LOGITS
likes
1.53
likes
1.18
Likes
1.09
Likes
1.06
like
0.71
IKES
0.68
LIKE
0.63
gosta
0.60
Like
0.59
liked
0.59
Activations Density 0.003%