INDEX
    Explanations

    words related to promotions, giveaways, contests, events, and announcements

    terms related to entertainment media and subscription services

    New Auto-Interp
    Negative Logits
    âĹ¼
    -0.99
    é¾įå
    -0.95
    ãģĦ
    -0.91
    76561
    -0.87
    ãģĹ
    -0.87
    ãģĨ
    -0.86
    omething
    -0.85
    ktop
    -0.84
    ippi
    -0.82
    20439
    -0.80
    POSITIVE LOGITS
     âĵĺ
    0.91
     ~
    0.76
    ↵↵
    0.74
     ----
    0.73
     **
    0.72
     --------------------------------
    0.71
     ·
    0.71
     wise
    0.69
    0.69
     |
    0.68
    Act Density 0.497%

    No Known Activations