INDEX
Explanations
numerical references and quantifications related to counts, especially those emphasizing groups or categories
New Auto-Interp
Negative Logits
stood
-0.17
olls
-0.17
eter
-0.16
s
-0.15
enas
-0.15
stå
-0.14
stuff
-0.14
icens
-0.13
ummings
-0.13
tems
-0.12
POSITIVE LOGITS
fold
0.27
different
0.23
-fold
0.22
teenth
0.22
ancy
0.21
-dimensional
0.20
eenth
0.18
-legged
0.18
-digit
0.18
acity
0.17
Activations Density 0.199%