INDEX
Explanations
phrases related to criticism and discrediting
ellipsis or unfinished thoughts
New Auto-Interp
Negative Logits
fide
-0.67
outl
-0.67
merit
-0.67
oxide
-0.62
attest
-0.62
peripher
-0.62
cultiv
-0.61
opath
-0.61
claimants
-0.59
finance
-0.59
POSITIVE LOGITS
until
1.26
but
1.18
they
1.15
again
1.15
yeah
1.13
BUT
1.13
yet
1.13
except
1.12
then
1.12
etc
1.12
Activations Density 0.040%