INDEX
Explanations
references to temporal relationships and conditions in a context
New Auto-Interp
Negative Logits
as
-0.78
all
-0.74
get
-0.73
in
-0.72
where
-0.71
no
-0.69
can
-0.68
for
-0.68
a
-0.68
is
-0.66
POSITIVE LOGITS
also
1.36
then
1.33
now
1.27
been
1.27
there
1.26
because
1.25
since
1.24
itſelf
1.23
again
1.23
Theſe
1.23
Activations Density 0.359%