INDEX
    Explanations

    proper nouns or named entities, although it also seems to have some sensitivity to titles

    references to charitable organizations

    Project, organization, and title names

    New Auto-Interp
    Negative Logits
    -1.15
    ,
    -1.02
     the
    -0.90
     I
    -0.88
     in
    -0.88
     to
    -0.87
     that
    -0.85
    -0.82
     as
    -0.82
     (
    -0.81
    POSITIVE LOGITS
    expandindo
    1.54
     Efq
    1.45
     Theſe
    1.41
     itſelf
    1.35
     disambiguazione
    1.31
    <unused43>
    1.31
    <unused14>
    1.30
    <unused16>
    1.30
    <unused8>
    1.30
    <pad>
    1.30
    Act Density 3.431%

    No Known Activations