INDEX
Explanations
phrases that indicate feelings and personal significance
New Auto-Interp
Negative Logits
ü
-0.17
ysis
-0.16
caa
-0.16
slip
-0.15
Cabin
-0.15
htar
-0.13
bare
-0.13
Sche
-0.13
este
-0.13
amiliar
-0.13
POSITIVE LOGITS
me
0.30
us
0.21
æĪij
0.18
tôi
0.17
saya
0.16
íĨµ
0.16
INGTON
0.16
atham
0.15
VID
0.15
ovel
0.15
Activations Density 0.116%