INDEX
    Explanations

    punctuation marks and symbols

    Quotation marks followed by specific words

    New Auto-Interp
    Negative Logits
    E
    -0.82
    S
    -0.81
    e
    -0.81
    P
    -0.80
    K
    -0.79
    WriteLiteral
    -0.79
    C
    -0.78
    l
    -0.77
    I
    -0.74
    X
    -0.72
    POSITIVE LOGITS
     myſelf
    1.46
     Theſe
    1.34
     Jefus
    1.29
     himſelf
    1.28
    ſelves
    1.27
     pleaſure
    1.26
     themſelves
    1.23
     uſed
    1.22
     whoſe
    1.21
     ainfi
    1.20
    Act Density 1.653%

    No Known Activations