INDEX
Explanations
specific articles like "a," "an," and "the."
New Auto-Interp
Negative Logits
myſelf
-1.39
pleaſure
-1.37
himſelf
-1.35
purpoſe
-1.34
itſelf
-1.29
Jefus
-1.29
Theſe
-1.29
Monfieur
-1.28
faſt
-1.27
iſt
-1.23
POSITIVE LOGITS
0.65
in
0.63
.
0.56
,
0.55
(
0.52
to
0.51
as
0.51
↵
0.50
with
0.49
[
0.47
Activations Density 0.068%