INDEX
Explanations
version numbers with dots
words or patterns resembling old English or archaic forms
New Auto-Interp
Negative Logits
.
-0.59
a
-0.57
of
-0.54
di
-0.53
ol
-0.52
hal
-0.49
aus
-0.49
qu
-0.49
che
-0.49
aan
-0.49
POSITIVE LOGITS
raiſ
1.15
itſelf
1.06
myſelf
1.05
faſt
1.04
uſed
1.00
uſe
0.95
fevere
0.94
pleaſure
0.89
ſever
0.89
ſche
0.88
Activations Density 1.267%