INDEX
    Explanations

    the presence of text formatting markers or structural elements in the document

    New Auto-Interp
    Negative Logits
     myſelf
    -1.39
     itſelf
    -1.37
     Reſ
    -1.32
     Anſ
    -1.22
     Theſe
    -1.21
     Houſe
    -1.21
     Efq
    -1.21
     Diſ
    -1.19
     Monfieur
    -1.19
     ―――――
    -1.19
    POSITIVE LOGITS
    0.73
    ↵↵
    0.69
     |
    0.63
     The
    0.61
    <eos>
    0.60
     •
    0.58
    <h1>
    0.57
    .
    0.57
    )))),
    0.56
    0.55
    Act Density 0.033%

    No Known Activations