INDEX
    Explanations

    proper nouns, particularly names and titles

    New Auto-Interp
    Negative Logits
     faſt
    -0.69
     erop
    -0.65
     againſt
    -0.65
     itſelf
    -0.64
     iſt
    -0.63
    ↵↵
    -0.63
     uſe
    -0.62
     preſent
    -0.61
     abstracta
    -0.60
     myſelf
    -0.60
    POSITIVE LOGITS
     G
    1.08
     getM
    1.00
    setH
    0.99
     getB
    0.98
     W
    0.98
     O
    0.97
     S
    0.97
     M
    0.96
     getP
    0.96
     P
    0.94
    Act Density 0.859%

    No Known Activations