INDEX
Explanations
phrases indicating negation or limitation in context
New Auto-Interp
Negative Logits
-
-0.84
,
-0.78
a
-0.75
.
-0.71
:
-0.69
the
-0.68
–
-0.67
-
-0.66
et
-0.62
of
-0.62
POSITIVE LOGITS
+#+#
1.05
########.
1.02
tslint
0.99
".
0.98
Савезне
0.94
httphttps
0.93
Мексичка
0.90
་་
0.90
✨:
0.90
$_"
0.89
Activations Density 0.487%