INDEX
Explanations
expressions of affection and love
New Auto-Interp
Negative Logits
mín
-0.61
Whitfield
-0.58
muligt
-0.56
Perugia
-0.56
unlikely
-0.55
dolci
-0.55
occurred
-0.54
auraient
-0.54
secours
-0.53
restantes
-0.53
POSITIVE LOGITS
loves
1.17
loved
1.13
love
1.08
loves
1.05
loved
1.04
Loves
1.03
liked
1.00
hates
0.97
Loves
0.96
likes
0.92
Activations Density 0.069%