INDEX
Explanations
symbols or expressions denoting mathematical relationships or conditions
New Auto-Interp
Negative Logits
y
-0.79
&
-0.66
lgari
-0.66
W
-0.65
<
-0.64
Peterson
-0.64
Champ
-0.64
Willoughby
-0.64
Gentry
-0.64
ud
-0.63
POSITIVE LOGITS
themſelves
1.20
myſelf
1.19
Diſ
1.16
Theſe
1.14
&\
1.12
himſelf
1.12
itſelf
1.11
ſeveral
1.07
&\
1.06
Anſ
1.05
Activations Density 0.006%