INDEX
Explanations
phrases indicating a method or manner of doing something
New Auto-Interp
Negative Logits
pleaſure
-1.34
myſelf
-1.16
ſche
-1.13
themſelves
-1.12
raiſ
-1.12
faſt
-1.12
ſta
-1.10
poffible
-1.09
becauſe
-1.05
ſever
-1.05
POSITIVE LOGITS
that
0.87
is
0.64
can
0.63
may
0.60
a
0.58
,
0.56
</h2>
0.53
was
0.52
made
0.51
cela
0.51
Activations Density 0.028%