INDEX
Explanations
phrases related to quantities or numbers
instances of the word "the" and other related quantifiers or descriptors
New Auto-Interp
Negative Logits
atics
-0.63
ault
-0.62
mart
-0.62
-0.61
ulated
-0.60
bg
-0.60
Supported
-0.60
Finally
-0.59
because
-0.59
wisely
-0.59
POSITIVE LOGITS
entire
1.21
slightest
1.16
entirety
1.11
latter
1.06
remainder
1.03
wearer
1.03
same
1.03
widest
1.02
longest
1.01
quickest
0.99
Activations Density 0.245%