INDEX
    Explanations

    references to social media interactions and users

    New Auto-Interp
    Negative Logits
    openh
    -0.19
    Ã
    -0.18
    ohon
    -0.15
    urbed
    -0.14
    abbo
    -0.14
     roundup
    -0.14
    alah
    -0.14
    vide
    -0.14
    olith
    -0.14
    afe
    -0.14
    POSITIVE LOGITS
    89
    0.25
    79
    0.24
    87
    0.24
    73
    0.23
    82
    0.23
    23
    0.23
    01
    0.23
    istrovstvÃŃ
    0.23
    69
    0.23
    85
    0.22
    Act Density 0.092%

    No Known Activations