INDEX
Explanations
phrases comparing quantity or degree
mentions of the word "less."
New Auto-Interp
Negative Logits
DK
-0.73
ULE
-0.71
Grab
-0.71
Events
-0.68
âĹ¼
-0.68
Sequence
-0.68
Draft
-0.68
Properties
-0.67
Disclaimer
-0.66
TRY
-0.66
POSITIVE LOGITS
than
1.17
ened
0.98
ening
0.96
forgiving
0.82
conspicuous
0.81
intrusive
0.79
costly
0.75
thumbnails
0.74
glamorous
0.73
invasive
0.72
Activations Density 0.040%