INDEX
    Explanations

    harming/threatening

    New Auto-Interp
    Negative Logits
    threatening
    -0.87
     threatening
    -0.83
     endangering
    -0.82
     harmful
    -0.77
     damaging
    -0.74
     endanger
    -0.69
     harming
    -0.69
     hazard
    -0.66
     colorir
    -0.65
     dañ
    -0.63
    POSITIVE LOGITS
    stateProvider
    0.63
    OrNil
    0.59
    jspx
    0.59
    nsic
    0.58
     the
    0.56
     préc
    0.56
    0.55
    Portale
    0.55
     ri
    0.53
    RegistryLite
    0.53
    Act Density 0.170%

    No Known Activations