INDEX
Explanations
comparisons indicating superiority or excellence
comparative phrases indicating superiority or improvement
New Auto-Interp
Negative Logits
hran
-0.77
urther
-0.74
VT
-0.72
illary
-0.70
Juda
-0.70
uum
-0.69
ural
-0.67
Import
-0.67
xy
-0.66
Alt
-0.65
POSITIVE LOGITS
average
0.84
abase
0.79
atever
0.77
ours
0.75
usual
0.74
average
0.73
anything
0.71
anybody
0.71
atos
0.69
Sponge
0.69
Activations Density 0.059%