INDEX
Explanations
expressions of personal identity and experiences
New Auto-Interp
Negative Logits
imid
-0.15
ousel
-0.14
overn
-0.14
anean
-0.13
Leaf
-0.13
olina
-0.13
eshire
-0.13
ugas
-0.13
>\<^
-0.13
aghan
-0.13
POSITIVE LOGITS
my
0.20
æĪijçļĦ
0.19
uni
0.18
meiner
0.17
æĪij
0.17
æĺ¯æĪij
0.16
me
0.16
minha
0.15
415
0.15
ï¼ĮæĪij
0.15
Activations Density 0.154%