INDEX
Explanations
expressions related to personal preferences and tastes
New Auto-Interp
Negative Logits
يتيمه
-0.45
锈钢
-0.45
pakah
-0.45
bootstrapcdn
-0.44
曖昧さ回避
-0.44
Meksiku
-0.44
differentiate
-0.43
allAfrica
-0.43
tanleria
-0.41
∭
-0.41
POSITIVE LOGITS
preferences
0.60
liking
0.58
preferences
0.58
Loves
0.56
gustaba
0.55
lover
0.55
gustan
0.55
preference
0.54
Preferences
0.54
loves
0.54
Activations Density 0.298%