INDEX
Explanations
negations or expressions of impossibility
New Auto-Interp
Negative Logits
-0.74
God
-0.65
-
-0.63
(
-0.63
A
-0.63
R
-0.63
,
-0.61
for
-0.60
Y
-0.59
:
-0.58
POSITIVE LOGITS
estekak
1.37
propOrder
1.32
дописавши
1.30
betweenstory
1.29
tartalomajánló
1.28
expandindo
1.25
__':
1.25
للاسماء
1.25
aarrggbb
1.21
+#+#
1.20
Activations Density 0.236%