INDEX
Explanations
instances of high numerical values or significant figures in various contexts
New Auto-Interp
Negative Logits
eux
-0.17
ragaz
-0.14
ë¹Ļ
-0.14
ymes
-0.14
нÑĮого
-0.14
THEM
-0.14
ká
-0.14
него
-0.13
æĺ¯æĪij
-0.13
yas
-0.13
POSITIVE LOGITS
Ù쨥ÙĨ
0.28
there
0.28
we
0.25
it
0.22
thì
0.22
there
0.21
they
0.20
we
0.19
nobody
0.18
nothing
0.18
Activations Density 0.448%