INDEX
Explanations
expressions related to opinions or beliefs
New Auto-Interp
Negative Logits
ourselves
-0.23
ï¼ĮæĪij们
-0.22
æĪij们çļĦ
-0.21
Ú©ÙĨÛĮÙħ
-0.20
دارÛĮÙħ
-0.19
Them
-0.18
abbiamo
-0.18
immel
-0.18
Them
-0.18
.We
-0.18
POSITIVE LOGITS
me
1.05
me
0.57
менÑı
0.54
_me
0.48
-me
0.47
ME
0.46
мне
0.45
Me
0.44
.me
0.42
me
0.40
Activations Density 0.261%