INDEX
Explanations
phrases indicating possibility or uncertainty
New Auto-Interp
Negative Logits
presumably
-0.30
probably
-0.29
apparently
-0.27
probably
-0.25
Probably
-0.24
undoubtedly
-0.24
reportedly
-0.23
surely
-0.23
evidently
-0.23
zÅĻejmÄĽ
-0.22
POSITIVE LOGITS
someday
0.26
even
0.23
slightly
0.20
TOO
0.19
/pro
0.19
better
0.19
sogar
0.18
some
0.18
best
0.17
too
0.17
Activations Density 0.230%