INDEX
    Explanations

    repeated or emphasized phrases, particularly those lacking specific content

    followed by punctuation or special characters

    legal citations and statutes

    New Auto-Interp
    Negative Logits
    }$
    -0.52
    <eos>
    -0.52
    </b>
    -0.48
    `
    -0.47
     $
    -0.47
     care
    -0.45
    -
    -0.44
     Las
    -0.44
    2
    -0.43
     do
    -0.43
    POSITIVE LOGITS
     myſelf
    0.93
     Efq
    0.93
    \
    0.88
    ConstraintMaker
    0.86
     Houſe
    0.84
     pleaſure
    0.84
     Anſ
    0.83
    InjectAttribute
    0.82
     Theſe
    0.79
    ſelf
    0.79
    Act Density 0.045%

    No Known Activations