INDEX
Explanations
the word "indeed" and to a lesser extent, words related to time and identity
New Auto-Interp
Negative Logits
Monfieur
-1.95
myſelf
-1.91
Efq
-1.91
pleaſure
-1.81
auffi
-1.80
ſmall
-1.80
faſt
-1.77
ſeveral
-1.76
iſt
-1.76
itſelf
-1.72
POSITIVE LOGITS
,
1.20
(
1.04
and
1.00
-
0.98
0.92
.
0.92
in
0.86
l
0.83
as
0.78
:
0.78
Activations Density 1.627%