INDEX
    Explanations

    mentions of things being or becoming popular

    references to the concept of popularity

    New Auto-Interp
    Negative Logits
    ¯¯
    -0.86
    erm
    -0.82
    ibur
    -0.79
    abol
    -0.76
    alk
    -0.74
    cise
    -0.74
    holes
    -0.71
    htaking
    -0.69
    gans
    -0.69
    rib
    -0.68
    POSITIVE LOGITS
     popularity
    1.11
    popular
    0.91
     Popular
    0.86
    yip
    0.84
     ratings
    0.80
     ubiqu
    0.77
     unpopular
    0.75
     renown
    0.75
    é¾įå¥ij士
    0.75
     diffusion
    0.74
    Act Density 0.009%

    No Known Activations