INDEX
    Explanations

    references to famous personalities and entities such as politicians, celebrities, and sports figures in news articles

    New Auto-Interp
    Negative Logits
    mbuds
    -0.74
    ipedia
    -0.73
    noon
    -0.71
    fo
    -0.64
     Helpful
    -0.63
    veyard
    -0.61
     Crimean
    -0.60
    english
    -0.60
    Duration
    -0.60
     Females
    -0.58
    POSITIVE LOGITS
     opted
    0.90
     underwent
    0.87
     joked
    0.85
     survived
    0.84
     admits
    0.83
     tweeted
    0.82
     insists
    0.80
     penned
    0.79
     endured
    0.78
     wore
    0.78
    Act Density 0.168%

    No Known Activations