INDEX
Explanations
strong formatting tags indicating emphasis or structure in text
New Auto-Interp
Negative Logits
-
-0.53
<0xE1>
-0.52
(
-0.52
-0.50
.
-0.49
on
-0.46
Z
-0.46
A
-0.45
in
-0.44
a
-0.44
POSITIVE LOGITS
itſelf
1.50
pleaſure
1.44
myſelf
1.43
―――――
1.36
purpoſe
1.36
themſelves
1.36
ſelf
1.36
Monfieur
1.35
raiſ
1.35
Majefty
1.34
Activations Density 0.117%