INDEX
Explanations
negative sentiments or criticisms
following "pre-" or "anti-"
prefixes followed by word parts
New Auto-Interp
Negative Logits
-
-1.26
,
-0.92
.
-0.91
:
-0.84
(
-0.76
!
-0.72
/
-0.72
of
-0.69
a
-0.69
in
-0.67
POSITIVE LOGITS
########.
1.30
للمعارف
1.30
itſelf
1.29
tvguidetime
1.15
་་
1.12
كومونز
1.10
$_"
1.07
".
1.05
#
1.03
photolibrary
1.02
Activations Density 0.577%