INDEX
Explanations
references to notes or annotations in text
New Auto-Interp
Negative Logits
teenth
-0.18
nutshell
-0.17
misd
-0.15
uster
-0.15
nek
-0.15
stood
-0.15
maal
-0.15
soever
-0.15
ll
-0.14
iggs
-0.14
POSITIVE LOGITS
books
0.32
book
0.27
ably
0.25
booking
0.24
lessly
0.23
able
0.22
edly
0.22
-taking
0.21
-worthy
0.21
Pad
0.21
Activations Density 0.045%