INDEX
    Explanations

    numeric values and references to key entities or concepts in the text

    New Auto-Interp
    Negative Logits
     W
    -0.66
     E
    -0.60
     S
    -0.59
     M
    -0.56
     C
    -0.54
     I
    -0.54
     O
    -0.54
     B
    -0.53
     K
    -0.51
     D
    -0.50
    POSITIVE LOGITS
     itſelf
    1.15
     iſt
    1.14
     Efq
    1.14
     leſs
    1.13
     Theſe
    1.13
     ſever
    1.12
     Anſ
    1.12
    ſelves
    1.11
     ſeveral
    1.11
     Reſ
    1.10
    Act Density 1.914%

    No Known Activations