INDEX
    Explanations

    proper nouns, specifically names of people and places

    New Auto-Interp
    Negative Logits
    featureID
    -1.13
     calendriers
    -0.82
     '\\;'
    -0.78
    LookAnd
    -0.78
    ########.
    -0.77
    ReusableCell
    -0.76
     beginnetje
    -0.76
     utafitiHapana
    -0.76
    setVerticalGroup
    -0.73
    rrggbb
    -0.68
    POSITIVE LOGITS
     Pert
    0.51
     Futter
    0.47
    arXiv
    0.47
    thorpe
    0.46
    naby
    0.46
     Bigg
    0.45
     Jardim
    0.45
     aud
    0.44
    forbes
    0.44
     reck
    0.44
    Act Density 0.495%

    No Known Activations