INDEX
Explanations
phrases expressing caution or warning
assertions or statements indicating that ability does not imply permission or appropriateness
New Auto-Interp
Negative Logits
throats
-0.83
bombed
-0.68
izen
-0.66
ebted
-0.64
matured
-0.63
ione
-0.62
specialize
-0.60
targets
-0.59
maneu
-0.59
ults
-0.59
POSITIVE LOGITS
soType
0.96
disqual
0.92
etheless
0.80
entit
0.79
preclude
0.79
negate
0.73
imply
0.73
means
0.72
Cause
0.70
advertising
0.70
Activations Density 0.169%