INDEX
Explanations
contractions of the phrase "do not"
negative contractions and related phrases indicating refusal or non-compliance
New Auto-Interp
Negative Logits
Published
-0.67
Unc
-0.64
Cance
-0.63
newcom
-0.58
parting
-0.57
distraction
-0.56
doomed
-0.55
couch
-0.55
caster
-0.55
nearest
-0.55
POSITIVE LOGITS
't
1.43
ned
0.98
uts
0.95
eness
0.90
keys
0.90
ÃŃ
0.90
gered
0.87
itely
0.87
acio
0.86
its
0.83
Activations Density 0.104%