INDEX
    Explanations

    words related to preferences or things that are liked or favored

    words that indicate popularity or preference

    New Auto-Interp
    Negative Logits
    ural
    -0.85
    inas
    -0.78
    ijk
    -0.77
    ufact
    -0.76
    TPPStreamerBot
    -0.76
    amping
    -0.73
    arty
    -0.72
    heed
    -0.72
    abad
    -0.69
    acial
    -0.68
    POSITIVE LOGITS
     favorites
    0.94
     favorite
    0.91
    Favorite
    0.84
     haunt
    0.81
     Favor
    0.81
     Favorite
    0.79
    é¾įå
    0.77
     favourites
    0.77
    é¾įå¥ij士
    0.76
    è¦
    0.74
    Act Density 0.013%

    No Known Activations