INDEX
Explanations
emphasis on words that denote moderation or limitation in statements
New Auto-Interp
Negative Logits
анÑģи
-0.14
ικÏĮÏĤ
-0.13
\<^
-0.13
ema
-0.13
ÙĨدا
-0.13
okud
-0.13
eden
-0.13
ема
-0.13
zeichnet
-0.13
vertise
-0.13
POSITIVE LOGITS
a
0.20
the
0.20
inton
0.15
an
0.15
those
0.14
ehr
0.14
uria
0.14
zell
0.14
one
0.14
ither
0.13
Activations Density 0.227%