INDEX
Explanations
affectionate terms and references to shortcomings
terms of endearment
New Auto-Interp
Negative Logits
selaer
-0.56
Vietnam
-0.53
URBANA
-0.52
Kanye
-0.52
iastes
-0.51
ویکی
-0.51
DNC
-0.51
Bihar
-0.50
bmx
-0.50
Process
-0.49
POSITIVE LOGITS
Darling
1.84
Darling
1.79
darling
1.74
sweetheart
0.79
dearest
0.77
Dearest
0.71
Adorable
0.63
adorable
0.62
dear
0.61
querida
0.61
Activations Density 0.002%