INDEX
Explanations
phrases that express appreciation and motivation
New Auto-Interp
Negative Logits
my
-0.27
myself
-0.27
让æĪij
-0.25
ç»ĻæĪij
-0.24
mine
-0.23
мне
-0.22
tôi
-0.22
æĪijçļĦ
-0.22
mijn
-0.21
meu
-0.21
POSITIVE LOGITS
we
0.44
our
0.42
ourselves
0.40
æĪij们
0.36
æĪij们çļĦ
0.34
æĪijåĢij
0.33
Our
0.33
ours
0.32
our
0.32
ï¼ĮæĪij们
0.32
Activations Density 0.036%