INDEX
Explanations
phrases indicating certainty or strong recommendation
New Auto-Interp
Negative Logits
realmente
-0.26
actually
-0.24
actually
-0.23
basically
-0.22
Actually
-0.21
Actually
-0.21
wirklich
-0.21
竣
-0.20
aslında
-0.20
åħ¶å®ŀ
-0.20
POSITIVE LOGITS
wouldn
0.18
nowhere
0.17
;y
0.16
none
0.16
plenty
0.15
684
0.15
ãģªãģĮãĤī
0.15
alguna
0.15
AMIL
0.15
worth
0.14
Activations Density 0.194%