INDEX
    Explanations

    social media and internet-related content, including comments, posts, and pictures

    New Auto-Interp
    Negative Logits
    ividual
    -0.94
    gren
    -0.75
     aven
    -0.74
    igenous
    -0.74
    agher
    -0.73
    amia
    -0.71
    ãģ®éŃĶ
    -0.70
    oliberal
    -0.70
    anwhile
    -0.69
    astered
    -0.67
    POSITIVE LOGITS
    ł
    1.26
    ª
    1.19
    «
    1.14
    ¥
    1.14
    ¦
    1.12
    ¡
    1.05
    £
    1.05
    ï¸
    1.05
    Ľ
    1.04
    Ģ
    1.01
    Act Density 0.169%

    No Known Activations