INDEX
Explanations
phrases indicating critical judgment or evaluation
phrases conveying dissatisfaction or negative evaluations
New Auto-Interp
Negative Logits
rather
-0.65
alm
-0.61
olutely
-0.60
azar
-0.59
ription
-0.59
usra
-0.57
esses
-0.57
cientious
-0.56
etermined
-0.56
eely
-0.56
POSITIVE LOGITS
anymore
1.72
nor
1.33
either
1.02
whatsoever
0.90
nor
0.86
yet
0.86
anywhere
0.85
yet
0.85
Enough
0.78
necessarily
0.78
Activations Density 0.293%