INDEX
    Explanations

    explaining, stating, or talking about

    New Auto-Interp
    Negative Logits
    Ι
    0.43
    0.41
    0.40
    Всім
    0.39
    0.39
    Тер
    0.39
    ണ്
    0.38
    發展
    0.38
    白色
    0.38
    狀態
    0.38
    POSITIVE LOGITS
     commenters
    0.52
     YouTube
    0.51
     Reddit
    0.51
     NPR
    0.51
     tweeted
    0.50
     LinkedIn
    0.49
     Email
    0.48
     BuzzFeed
    0.48
     Reuters
    0.48
     emailed
    0.47
    Act Density 0.001%

    No Known Activations