INDEX
Explanations
negative statements or warnings
phrases indicating the distinction between capability and moral or practical obligation
New Auto-Interp
Negative Logits
ahime
-0.78
throats
-0.71
ebted
-0.70
bottled
-0.63
energies
-0.61
matured
-0.60
administered
-0.60
hel
-0.59
ãĥ¼ãĤ¯
-0.58
chairs
-0.58
POSITIVE LOGITS
soType
0.89
cause
0.79
disqual
0.77
¿½
0.76
advertising
0.76
necessarily
0.75
Cause
0.75
ECA
0.74
automatically
0.74
etheless
0.73
Activations Density 0.154%