INDEX
Explanations
strong affirmatives and emphatic expressions of certainty
New Auto-Interp
Negative Logits
realmente
-0.24
really
-0.18
simply
-0.17
gerçekten
-0.17
actually
-0.17
einfach
-0.16
simplement
-0.16
wirklich
-0.16
actually
-0.15
really
-0.15
POSITIVE LOGITS
ãģªãģĮãĤī
0.18
NOT
0.17
not
0.17
worth
0.16
leck
0.15
/random
0.15
iswa
0.15
wouldn
0.15
something
0.14
;y
0.14
Activations Density 0.064%