INDEX
Explanations
references to loved ones and emotional connections to them
New Auto-Interp
Negative Logits
LOVE
-0.18
lover
-0.17
лÑİбов
-0.17
loo
-0.17
.scalablytyped
-0.17
love
-0.16
loving
-0.16
lovers
-0.16
ková
-0.15
love
-0.15
POSITIVE LOGITS
ones
0.28
ones
0.25
Ones
0.23
olls
0.19
relative
0.17
ammers
0.17
errick
0.15
ONES
0.15
who
0.14
pet
0.14
Activations Density 0.008%