INDEX
Explanations
the beginning of a new document or section
New Auto-Interp
Negative Logits
,
-0.70
-0.69
–
-0.69
(
-0.63
-
-0.63
.
-0.59
:
-0.58
in
-0.56
B
-0.56
i
-0.56
POSITIVE LOGITS
myſelf
1.40
pleaſure
1.39
purpoſe
1.36
houſe
1.34
itſelf
1.34
ſelf
1.33
GenerationType
1.33
greateſt
1.24
Houſe
1.23
Anſ
1.22
Activations Density 0.112%