INDEX
Explanations
words signaling certainty or emphasis
the word "definitely" and its variations, indicating strong affirmation
New Auto-Interp
Negative Logits
acity
-0.80
glers
-0.79
mits
-0.78
bestos
-0.75
roups
-0.75
lings
-0.75
Reviewer
-0.74
ufact
-0.73
gencies
-0.70
umbn
-0.68
POSITIVE LOGITS
identifiable
0.72
impacted
0.70
qualifies
0.70
differentiated
0.67
Vader
0.67
disqual
0.66
deline
0.66
detract
0.66
benefited
0.65
noticeable
0.65
Activations Density 0.024%