INDEX
    Explanations

    harmful, unethical, racist, sexist, toxic, dangerous, or illegal

    New Auto-Interp
    Negative Logits
     ২০২২
    0.70
     २०२२
    0.66
    🫶
    0.64
    🥹
    0.60
    🫣
    0.60
    🫢
    0.56
    🫠
    0.54
     ২০২১
    0.53
    🥲
    0.52
    🪄
    0.51
    POSITIVE LOGITS
     coronavirus
    1.30
     Coronavirus
    1.28
    Coronavirus
    1.24
    coronavirus
    1.19
     कोरोनावायरस
    0.97
     коронави
    0.95
     করোনাভাই
    0.93
     corona
    0.89
     Corona
    0.87
    corona
    0.82
    Act Density 0.005%

    No Known Activations