INDEX
Explanations
adverbs indicating certainty or strong emphasis
assertive language indicating certainty or definitiveness
New Auto-Interp
Negative Logits
respectively
-0.77
entary
-0.76
ingly
-0.75
awaru
-0.71
atem
-0.70
oru
-0.70
roups
-0.68
glers
-0.67
ultimate
-0.66
ENCY
-0.65
POSITIVE LOGITS
qualifies
0.80
deserved
0.73
wasn
0.73
not
0.73
NOT
0.69
wouldn
0.69
deserve
0.69
wont
0.68
deserves
0.68
weren
0.68
Activations Density 0.082%