INDEX
    Explanations

    requests for inappropriate or harmful content.

    New Auto-Interp
    Negative Logits
     загру
    0.43
     loads
    0.42
     overloaded
    0.42
     riches
    0.42
     impatient
    0.41
    Nodes
    0.41
     отлич
    0.40
    0.40
     overloading
    0.40
    0.40
    POSITIVE LOGITS
     harmless
    0.84
     legitt
    0.79
     legitimate
    0.73
     legít
    0.73
     permissible
    0.71
     lawful
    0.71
    あくまで
    0.68
     lawfully
    0.67
     innocuous
    0.67
     respectful
    0.64
    Act Density 1.771%

    No Known Activations