INDEX
    Explanations

    mentions of social media platforms and charity-related terms

    New Auto-Interp
    Negative Logits
    enterOuterAlt
    -0.48
    CloseOperation
    -0.47
     käyt
    -0.45
     plegable
    -0.40
     durata
    -0.40
     gydy
    -0.40
     mijne
    -0.38
     sánh
    -0.38
     ejus
    -0.38
     ocasião
    -0.38
    POSITIVE LOGITS
     twitter
    0.70
     neuro
    0.70
    neuro
    0.63
     nervous
    0.63
     Neuro
    0.61
     Twitter
    0.60
     culti
    0.60
     tweet
    0.60
    Neuro
    0.59
     neurological
    0.59
    Act Density 0.193%

    No Known Activations