INDEX
Explanations
adverbs, particularly those ending in 'ly'
New Auto-Interp
Negative Logits
GOODMAN
-0.92
ilater
-0.85
afety
-0.84
irlf
-0.83
eport
-0.75
ERY
-0.73
Emin
-0.72
Extrem
-0.72
arella
-0.70
itures
-0.70
POSITIVE LOGITS
adv
0.97
priced
0.89
gged
0.86
supported
0.82
comed
0.79
present
0.78
tics
0.77
rewarded
0.77
rics
0.76
appreciated
0.74
Activations Density 0.031%