INDEX
Explanations
possessive pronouns followed by positive emotions or actions
references to love and its complexities
New Auto-Interp
Negative Logits
millenn
-0.88
Module
-0.78
rax
-0.78
Imran
-0.77
redund
-0.76
uthor
-0.76
rm
-0.75
ickets
-0.75
ulhu
-0.74
mins
-0.74
POSITIVE LOGITS
Love
2.16
love
2.01
Love
2.00
love
1.96
LOVE
1.87
loving
1.52
loves
1.51
Loving
1.50
romance
1.46
lover
1.43
Activations Density 0.293%