INDEX
Explanations
roles or titles associated with individuals in various professions or activities
New Auto-Interp
Negative Logits
another
-0.19
our
-0.18
their
-0.17
hoặc
-0.17
æĪĸ
-0.17
æĪĸèĢħ
-0.16
a
-0.16
EITHER
-0.16
either
-0.16
або
-0.16
POSITIVE LOGITS
-turned
0.43
extra
0.41
turned
0.38
-extra
0.38
extra
0.34
,
0.32
extraordin
0.31
turned
0.29
Extra
0.28
and
0.28
Activations Density 0.181%