INDEX
Explanations
contractions of "do not" or similar phrases expressing negative sentiment
negations in statements
New Auto-Interp
Negative Logits
artney
-0.78
Engineers
-0.69
ersed
-0.68
Citiz
-0.67
Parenthood
-0.67
DeliveryDate
-0.65
Passage
-0.63
Compass
-0.63
pione
-0.62
atur
-0.61
POSITIVE LOGITS
kidding
1.03
necessarily
0.93
ashamed
0.91
myself
0.90
sure
0.89
gonna
0.87
afraid
0.86
oooooooo
0.84
hin
0.83
believe
0.81
Activations Density 0.083%