INDEX
Explanations
emphatic expressions of certainty or strong affirmation
New Auto-Interp
Negative Logits
ison
-0.17
sdale
-0.17
pty
-0.16
retty
-0.15
ways
-0.15
la
-0.14
mie
-0.14
wald
-0.14
ocide
-0.14
lems
-0.14
POSITIVE LOGITS
positively
0.22
OLUTE
0.21
olutely
0.21
utely
0.19
querque
0.18
-zero
0.18
olut
0.17
-ÑĤаки
0.17
certainty
0.16
certain
0.16
Activations Density 0.022%