INDEX
Explanations
phrases indicating exceptions or limitations
phrases emphasizing exceptions or limitations
New Auto-Interp
Negative Logits
hoop
-0.67
merry
-0.63
onew
-0.63
anners
-0.62
hurry
-0.59
ocratic
-0.59
ocrats
-0.58
passers
-0.58
rity
-0.58
glide
-0.58
POSITIVE LOGITS
LIMITED
0.93
limited
0.86
excluding
0.81
limitation
0.79
excluding
0.76
excluded
0.73
excludes
0.71
minus
0.71
exclusive
0.70
including
0.70
Activations Density 0.102%