INDEX
Explanations
differences or comparisons between items
comparative phrases that indicate differences or improvements between items
New Auto-Interp
Negative Logits
erity
-0.88
ilic
-0.75
nton
-0.74
elfare
-0.72
illions
-0.71
igious
-0.70
services
-0.65
arcity
-0.65
efe
-0.65
igent
-0.65
POSITIVE LOGITS
normal
1.31
usual
1.28
previous
1.26
regular
1.13
vanilla
1.11
originals
1.08
original
1.06
standard
1.05
ordinary
1.02
normal
0.99
Activations Density 0.305%