INDEX
Explanations
language related to legal or financial obligations
New Auto-Interp
Negative Logits
pleaſure
-1.35
houſe
-1.28
purpoſe
-1.27
Majefty
-1.26
Monfieur
-1.21
myſelf
-1.20
faſt
-1.16
Houſe
-1.14
ſche
-1.13
Jefus
-1.13
POSITIVE LOGITS
human
0.44
de
0.43
no
0.42
so
0.41
pe
0.41
some
0.38
Edited
0.38
perhaps
0.38
[
0.38
un
0.38
Activations Density 2.771%