INDEX
Explanations
phrases indicating alternatives or options
New Auto-Interp
Negative Logits
oux
-0.16
quina
-0.15
andum
-0.15
wers
-0.15
illy
-0.15
ournal
-0.15
aille
-0.14
agma
-0.14
plant
-0.14
wer
-0.14
POSITIVE LOGITS
than
0.18
besides
0.18
apart
0.17
кÑĢоме
0.17
umin
0.17
_than
0.17
except
0.16
Agenda
0.16
_ASSUME
0.15
than
0.15
Activations Density 0.065%