INDEX
    Explanations

    general terms and punctuation that may indicate formatting or structural elements in the text

    New Auto-Interp
    Negative Logits
     myſelf
    -1.93
     itſelf
    -1.91
     Efq
    -1.81
     Monfieur
    -1.79
    ſelf
    -1.78
    ſelves
    -1.78
     Theſe
    -1.74
     Jefus
    -1.74
     ―――――
    -1.73
     themſelves
    -1.72
    POSITIVE LOGITS
    1.45
    <eos>
    1.23
    ↵↵
    1.09
    '
    1.02
    !
    1.01
     .
    1.01
    ...
    1.00
     -
    1.00
     a
    0.99
     to
    0.97
    Act Density 0.949%

    No Known Activations