INDEX
    Explanations

    references to celebrities

    New Auto-Interp
    Negative Logits
    doors
    -0.71
    choes
    -0.70
    hematic
    -0.69
     Wind
    -0.69
    nerg
    -0.66
    uv
    -0.66
    atives
    -0.66
    empty
    -0.65
    nda
    -0.64
    rt
    -0.64
    POSITIVE LOGITS
    rities
    1.14
    wcs
    1.05
     celebrities
    1.00
     endors
    0.97
     celebrity
    0.93
     endorsements
    0.89
     celeb
    0.85
     gossip
    0.81
     superstar
    0.78
     Celebrity
    0.77
    Act Density 0.012%

    No Known Activations