INDEX
Explanations
expressions of love and enjoyment towards various subjects
expressing strong positive feeling
New Auto-Interp
Negative Logits
ை
-0.45
счет
-0.43
Sycamore
-0.42
tabPage
-0.42
]")]
-0.41
쇼
-0.40
asu
-0.40
TagMode
-0.40
treason
-0.40
счёт
-0.40
POSITIVE LOGITS
loved
1.52
Loved
1.48
Loved
1.44
loved
1.41
LOVED
1.30
liked
1.20
liked
1.02
Liked
0.99
Liked
0.97
LookAnd
0.81
Activations Density 0.003%