INDEX
Explanations
content related to specific medical or scientific terms
New Auto-Interp
Negative Logits
<eos>
-0.58
in
-0.52
-0.51
and
-0.49
-0.47
of
-0.47
[…]
-0.45
.
-0.45
...
-0.45
…
-0.44
POSITIVE LOGITS
pleaſure
1.16
Jefus
1.13
ⓧ
1.10
Majefty
1.07
+#+#
1.07
itſelf
1.07
myſelf
1.07
Diſ
1.06
ſever
1.06
Monfieur
1.06
Activations Density 1.120%