INDEX
    Explanations

    deeply harmful or dangerous

    New Auto-Interp
    Negative Logits
    能够
    0.41
    0.40
    ையுடன்
    0.39
     అందించ
    0.37
    但也
    0.37
    สามารถ
    0.36
    有时候
    0.36
    0.36
    包含
    0.36
     included
    0.35
    POSITIVE LOGITS
     horrific
    0.74
     dreadful
    0.73
     disgraceful
    0.72
     inept
    0.71
     appalling
    0.70
     hopelessly
    0.70
     unacceptable
    0.69
     horrendous
    0.69
     dismal
    0.69
     disgusting
    0.68
    Act Density 0.558%

    No Known Activations