INDEX
    Explanations

    names of celebrities or public figures

    mentions of proper names and surnames

    New Auto-Interp
    Negative Logits
    agonist
    -0.63
    yip
    -0.61
     Eater
    -0.59
    ersive
    -0.58
    allery
    -0.55
    orney
    -0.54
    ymes
    -0.54
    netflix
    -0.54
    yton
    -0.54
    vertisements
    -0.53
    POSITIVE LOGITS
    anc
    0.68
    ANC
    0.64
    ans
    0.59
    ois
    0.53
    cest
    0.53
    apt
    0.51
    Going
    0.48
    cia
    0.48
    ousse
    0.47
    ahi
    0.47
    Act Density 0.208%

    No Known Activations