INDEX
Explanations
headings or titles that introduce sections of content
New Auto-Interp
Negative Logits
a
-1.12
the
-1.02
in
-0.95
of
-0.93
-0.92
her
-0.90
it
-0.87
to
-0.84
an
-0.84
I
-0.82
POSITIVE LOGITS
Anſ
1.53
itſelf
1.52
Efq
1.49
ſche
1.49
myſelf
1.48
pleaſure
1.48
Monfieur
1.48
ſtate
1.47
purpoſe
1.44
Jefus
1.42
Activations Density 0.636%