INDEX
    Explanations

    mentions of being at the top

    references to "top" positions or rankings

    New Auto-Interp
    Negative Logits
    gm
    -0.71
    ellow
    -0.66
    ija
    -0.64
    fw
    -0.63
    ouri
    -0.62
    itta
    -0.61
    selves
    -0.60
    ewitness
    -0.60
     Parenthood
    -0.58
     fiance
    -0.58
    POSITIVE LOGITS
    most
    1.06
     level
    0.86
     thereof
    0.82
    liest
    0.76
    mast
    0.75
     tier
    0.75
    side
    0.75
     end
    0.74
    loader
    0.73
     of
    0.72
    Act Density 0.047%

    No Known Activations