INDEX
    Explanations

    names of people or places

    proper nouns, particularly names

    New Auto-Interp
    Negative Logits
    bound
    -0.73
    bre
    -0.67
    breaking
    -0.61
    position
    -0.60
    ivated
    -0.60
    hillary
    -0.57
    stri
    -0.57
    breakers
    -0.56
    imm
    -0.56
    con
    -0.56
    POSITIVE LOGITS
    's
    0.70
     herself
    0.69
     Sr
    0.67
     commented
    0.64
    stals
    0.64
    ites
    0.63
    iev
    0.62
    inen
    0.61
    enegger
    0.61
     KE
    0.61
    Act Density 0.290%

    No Known Activations