INDEX
Explanations
instances of the word "love" and its variations
New Auto-Interp
Negative Logits
urch
-0.18
ëı
-0.17
iff
-0.16
æijĩ
-0.15
erer
-0.15
Ùħا
-0.15
iffs
-0.14
kontakte
-0.14
prés
-0.14
geber
-0.14
POSITIVE LOGITS
eliness
0.23
ely
0.19
vv
0.18
renc
0.18
ullo
0.18
alker
0.16
ett
0.16
ell
0.16
ohl
0.16
ania
0.15
Activations Density 0.005%