INDEX
Explanations
phrases related to personal relationships and emotional attachments
New Auto-Interp
Negative Logits
efe
-0.16
ırak
-0.15
cater
-0.14
Nico
-0.14
bread
-0.14
491
-0.14
Miy
-0.14
ãĤ¤ãĥ¤
-0.14
unicorn
-0.13
Nic
-0.13
POSITIVE LOGITS
azon
0.17
avin
0.17
áŀ¶
0.16
elas
0.16
@{0.15
olest
0.14
chwitz
0.14
æİĽ
0.14
ndon
0.14
å¤Ħ
0.14
Activations Density 0.004%