INDEX
Explanations
numeric values and identifiers in a structured format
New Auto-Interp
Negative Logits
,
-0.71
in
-0.60
and
-0.54
a
-0.53
to
-0.52
on
-0.52
of
-0.51
as
-0.51
(
-0.49
.
-0.49
POSITIVE LOGITS
вÑĸ
0.28
меÑĪ
0.27
наÑģеленнÑı
0.25
пÑĢип
0.25
оÑģÑĸб
0.25
доÑĢ
0.25
вÑĸд
0.24
ÑĢозп
0.23
оÑģоби
0.23
понад
0.23
Activations Density 0.002%