INDEX
Explanations
expressions of fondness or affection
New Auto-Interp
Negative Logits
orgot
-0.16
angelo
-0.16
undo
-0.15
eer
-0.15
een
-0.15
uels
-0.15
andler
-0.15
uze
-0.15
iko
-0.15
eyer
-0.14
POSITIVE LOGITS
amental
0.30
ue
0.24
amentals
0.22
ness
0.18
ksiyon
0.16
memories
0.16
ãģ¼
0.15
akk
0.15
hin
0.15
scal
0.15
Activations Density 0.005%