INDEX
Explanations
expressions of personal preferences and feelings
followed by positive sentiment
expressing strong positive feelings
New Auto-Interp
Negative Logits
клопе
-0.52
<!--[
-0.48
مشين
-0.47
LEP
-0.45
Aware
-0.44
alay
-0.44
esperienze
-0.44
IPT
-0.42
Aware
-0.42
GAZ
-0.42
POSITIVE LOGITS
liked
1.95
love
1.86
loved
1.84
liking
1.78
loves
1.74
LOVE
1.59
LOVED
1.57
likes
1.53
loved
1.50
liked
1.49
Activations Density 0.369%