INDEX
Explanations
pronouns that indicate the reader's potential experiences or actions
New Auto-Interp
Negative Logits
oproject
-0.15
Mayer
-0.15
inspace
-0.15
erus
-0.15
umat
-0.14
imore
-0.14
Äĥm
-0.14
ÏĦικ
-0.14
toi
-0.14
ÙĨØ´
-0.14
POSITIVE LOGITS
sure
0.23
_defs
0.19
'll
0.19
will
0.18
sure
0.18
guaranteed
0.16
Sure
0.16
’ll
0.16
certainty
0.16
WILL
0.15
Activations Density 0.059%