INDEX
Explanations
phrases expressing uncertainty about actions or knowledge
New Auto-Interp
Negative Logits
―――――
-1.20
itſelf
-1.20
Efq
-1.19
pleaſure
-1.18
ſind
-1.15
ſeveral
-1.15
Majefty
-1.15
Anſ
-1.14
Jefus
-1.12
Monfieur
-1.12
POSITIVE LOGITS
.
0.66
a
0.65
(
0.65
0.61
,
0.60
in
0.57
and
0.56
I
0.55
of
0.54
-
0.53
Activations Density 0.126%