INDEX
Explanations
occurrences of the word "we" and related pronouns
New Auto-Interp
Negative Logits
ours
-0.23
our
-0.21
æĪij们çļĦ
-0.18
akan
-0.18
noss
-0.17
agog
-0.16
æĪij们
-0.16
nostro
-0.16
нами
-0.16
nostra
-0.15
POSITIVE LOGITS
hart
0.17
kker
0.15
tz
0.15
lier
0.15
awei
0.15
504
0.14
cape
0.14
ROI
0.14
405
0.14
erral
0.13
Activations Density 0.103%