INDEX
Explanations
adverbs indicating frequency or commonality
phrases indicating frequency or regularity of occurrences
New Auto-Interp
Negative Logits
atur
-0.91
itivity
-0.86
ves
-0.82
uble
-0.80
atures
-0.80
ibility
-0.78
own
-0.78
idates
-0.76
itives
-0.76
leground
-0.73
POSITIVE LOGITS
excluding
1.06
implying
1.01
preferring
0.96
indicating
0.91
numbering
0.91
suggesting
0.90
culminating
0.89
skipping
0.88
consisting
0.88
conclud
0.88
Activations Density 0.146%