INDEX
    Explanations

    proper nouns related to political figures, sports teams, and locations

    New Auto-Interp
    Negative Logits
    ACTED
    -0.80
    Lt
    -0.78
    Redd
    -0.73
     Spy
    -0.72
    UGC
    -0.71
    CG
    -0.70
    ulhu
    -0.68
    slave
    -0.67
     HI
    -0.66
    Si
    -0.66
    POSITIVE LOGITS
     Barron
    0.96
    otyp
    0.75
    alon
    0.75
    asso
    0.75
    abad
    0.75
    cloth
    0.74
    agus
    0.71
    xual
    0.70
    mares
    0.70
    agraph
    0.70
    Act Density 0.263%

    No Known Activations