INDEX
Explanations
special formatting or symbols in the text
New Auto-Interp
Negative Logits
RegressionTest
-1.10
ſy
-1.03
myſelf
-1.02
purpoſe
-1.01
uſed
-0.98
pleaſure
-0.94
fevere
-0.94
preſent
-0.94
reaſon
-0.93
propOrder
-0.92
POSITIVE LOGITS
s
0.85
</sub>
0.72
</i>
0.70
}}$
0.69
̈
0.68
}}
0.68
/}
0.67
</em>
0.67
i
0.66
</sup>
0.65
Activations Density 0.258%