INDEX
Explanations
mentions of love or affection towards someone
instances of the word "loved" and expressions of affection or emotional connections
New Auto-Interp
Negative Logits
statement
-0.69
======
-0.64
deposition
-0.64
report
-0.63
dm
-0.61
admin
-0.61
DM
-0.61
system
-0.59
blocking
-0.59
minimum
-0.58
POSITIVE LOGITS
loved
3.89
loves
2.06
hated
1.85
liked
1.79
love
1.72
beloved
1.63
cherished
1.60
loving
1.60
disliked
1.48
admired
1.46
Activations Density 0.018%