INDEX
Explanations
phrases that emphasize possession and personal connection
New Auto-Interp
Negative Logits
jan
-0.15
orz
-0.15
famed
-0.14
gel
-0.14
ties
-0.14
awn
-0.14
äm
-0.14
parent
-0.14
ALES
-0.13
ebb
-0.13
POSITIVE LOGITS
desired
0.18
choice
0.16
desired
0.15
íĭĢ
0.15
venes
0.15
wil
0.15
avin
0.15
vest
0.15
onne
0.14
Ñıз
0.14
Activations Density 0.169%