INDEX
Explanations
instances of mathematical notation and references
New Auto-Interp
Negative Logits
.
-0.85
,
-0.75
(
-0.73
a
-0.71
-0.70
S
-0.69
:
-0.69
)
-0.68
in
-0.66
M
-0.63
POSITIVE LOGITS
itſelf
1.67
myſelf
1.56
Majefty
1.54
Reſ
1.53
Monfieur
1.53
themſelves
1.51
himſelf
1.51
ſeveral
1.50
ſelf
1.49
pleaſure
1.48
Activations Density 0.318%