INDEX
Explanations
words related to strong agreement or emphasis
the term "definitely" and its variations to express certainty or affirmation
New Auto-Interp
Negative Logits
roups
-0.87
ently
-0.86
sembly
-0.84
acity
-0.78
lings
-0.74
aciously
-0.73
ously
-0.71
mits
-0.70
Mour
-0.70
lete
-0.69
POSITIVE LOGITS
underest
0.71
Vader
0.70
recommend
0.69
impacted
0.68
correlated
0.66
ove
0.65
influenced
0.64
qualifies
0.64
bothered
0.63
differentiated
0.63
Activations Density 0.030%