INDEX
Explanations
mathematical operators and notation used in formal definitions
New Auto-Interp
Negative Logits
s
-0.78
D
-0.74
K
-0.74
i
-0.74
d
-0.73
S
-0.72
K
-0.72
l
-0.71
Men
-0.71
z
-0.70
POSITIVE LOGITS
myſelf
1.58
Jefus
1.44
ſelves
1.43
pleaſure
1.43
themſelves
1.40
purpoſe
1.38
raiſ
1.36
himſelf
1.32
juſt
1.29
uſed
1.29
Activations Density 0.792%