INDEX
Explanations
phrases indicating negation or denial
New Auto-Interp
Negative Logits
ItemThumbnailImage
-0.72
selves
-0.70
throats
-0.66
interstitial
-0.65
Monitor
-0.63
stood
-0.61
Tours
-0.61
Mo
-0.58
chambers
-0.58
arsen
-0.57
POSITIVE LOGITS
necessarily
1.29
advisable
0.92
suffice
0.88
uncommon
0.87
necess
0.86
condone
0.86
icable
0.82
feasible
0.82
eworthy
0.82
icably
0.81
Activations Density 0.094%