INDEX
    Explanations

    names of famous people and places

    proper nouns, specifically names of individuals

    New Auto-Interp
    Negative Logits
     RIS
    -0.82
     tremend
    -0.76
     Corpus
    -0.71
     occas
    -0.69
     metic
    -0.69
     pione
    -0.68
     laun
    -0.68
     exting
    -0.67
     Citiz
    -0.67
     enthusi
    -0.65
    POSITIVE LOGITS
    anyahu
    0.96
    imore
    0.94
    inson
    0.92
    rick
    0.86
    eret
    0.86
    ison
    0.86
    ridor
    0.84
    rake
    0.84
    cox
    0.84
    rigan
    0.84
    Act Density 0.148%

    No Known Activations