INDEX
    Explanations

    names of individuals and entities in news articles

    proper nouns related to people and their affiliations

    New Auto-Interp
    Negative Logits
     likes
    -0.64
    ---------
    -0.64
     demands
    -0.59
     DOES
    -0.58
     bends
    -0.58
    needs
    -0.58
     persists
    -0.57
     wakes
    -0.57
     conqu
    -0.57
     doesnt
    -0.56
    POSITIVE LOGITS
     respectively
    1.70
     jointly
    1.17
     were
    1.15
     discuss
    1.14
     are
    1.11
     collaborate
    1.05
    were
    1.05
    both
    1.03
     collide
    1.01
     both
    0.99
    Act Density 0.373%

    No Known Activations