INDEX
Explanations
phrases that indicate quantities and numerical data
New Auto-Interp
Negative Logits
ολ
-0.16
ANDOM
-0.15
ile
-0.15
ý
-0.14
eltas
-0.14
venues
-0.14
OLA
-0.14
ola
-0.14
otos
-0.14
\CMS
-0.14
POSITIVE LOGITS
syll
0.18
times
0.18
fewer
0.17
steps
0.17
altogether
0.16
745
0.15
asty
0.15
ossa
0.15
ieder
0.15
votes
0.15
Activations Density 0.103%