INDEX
Explanations
modal verbs indicating future actions or capabilities
New Auto-Interp
Negative Logits
Hubb
-0.16
/dc
-0.15
.qual
-0.15
atars
-0.14
will
-0.14
οι
-0.14
ffd
-0.14
áct
-0.14
://
-0.14
Elena
-0.14
POSITIVE LOGITS
ingly
0.18
ulist
0.16
iam
0.16
IAM
0.16
fully
0.16
iams
0.16
amic
0.16
you
0.15
oom
0.15
owy
0.15
Activations Density 0.258%