INDEX
Explanations
phrases indicating small quantities or degrees
New Auto-Interp
Negative Logits
ed
-0.19
ftware
-0.16
somewhat
-0.16
edo
-0.15
eded
-0.15
δί
-0.15
slightly
-0.14
nt
-0.14
intended
-0.14
hi
-0.14
POSITIVE LOGITS
/stdc
0.28
umen
0.27
.ly
0.25
mapped
0.21
Torrent
0.20
rary
0.20
ingly
0.20
umin
0.20
more
0.20
tern
0.18
Activations Density 0.018%