INDEX
Explanations
expressions of preference or liking
"like" or "likes" followed by infinitives or objects
New Auto-Interp
Negative Logits
getDoctrine
-0.64
ات
-0.61
se
-0.58
ResumeLayout
-0.56
arXiv
-0.56
pub
-0.55
quanto
-0.55
devServer
-0.55
mencapai
-0.53
dymyr
-0.53
POSITIVE LOGITS
liest
0.84
EconPapers
0.82
+#+#
0.74
lihood
0.72
évaluateur
0.67
postIndex
0.67
dislike
0.65
Мексичка
0.65
ivelany
0.64
👍👍
0.64
Activations Density 0.050%