INDEX
Explanations
formal mathematical properties and statements related to definitions and theorems
New Auto-Interp
Negative Logits
-0.51
Ne
-0.50
:
-0.48
[
-0.47
ברי
-0.47
typeparam
-0.47
(
-0.46
-0.45
s
-0.45
mode
-0.45
POSITIVE LOGITS
itſelf
1.20
myſelf
1.15
houſe
1.13
purpoſe
1.11
pleaſure
1.06
laſt
0.99
perſon
0.96
ſeveral
0.96
greateſt
0.96
ſtand
0.94
Activations Density 0.240%