INDEX
Explanations
instances of the term "love" and its variations
New Auto-Interp
Negative Logits
erer
-0.19
ubern
-0.17
ucer
-0.16
anker
-0.16
alist
-0.15
auf
-0.15
ÃľRK
-0.15
Ñĥки
-0.15
eker
-0.15
ermen
-0.14
POSITIVE LOGITS
eland
0.30
eliness
0.29
ett
0.27
ell
0.25
esome
0.24
ely
0.24
estr
0.23
estone
0.23
ells
0.22
emarks
0.22
Activations Density 0.006%