INDEX
Explanations
actions and processes related to change, creation, and functioning dynamics
New Auto-Interp
Negative Logits
their
-0.24
the
-0.23
that
-0.22
to
-0.22
er
-0.22
they
-0.20
eh
-0.20
than
-0.20
test
-0.19
this
-0.19
POSITIVE LOGITS
itself
0.35
heets
0.28
’
0.24
'
0.24
cales
0.23
cape
0.21
cribes
0.21
Ñģобой
0.20
izes
0.20
creens
0.19
Activations Density 0.742%