INDEX
Explanations
numerical values or references to statistics and quantities
New Auto-Interp
Negative Logits
favors
-0.19
honor
-0.18
honorable
-0.17
honors
-0.17
molding
-0.17
Harbor
-0.17
avior
-0.16
honoring
-0.16
theater
-0.16
colorful
-0.16
POSITIVE LOGITS
sdale
0.15
é£
0.15
à¸¸à¸Ľ
0.14
éĥ
0.14
croft
0.14
|_|
0.14
headline
0.14
ben
0.14
HEMA
0.14
.exc
0.14
Activations Density 0.125%