INDEX
    Explanations

    terms and phrases related to safety and security

    New Auto-Interp
    Negative Logits
    yarnpkg
    -0.49
     mídia
    -0.46
    Tikang
    -0.43
     Opus
    -0.43
    lungen
    -0.43
     typewriter
    -0.43
     subscribers
    -0.42
    Itr
    -0.42
    llus
    -0.42
     mustache
    -0.41
    POSITIVE LOGITS
    Safe
    1.21
    Safety
    1.16
     Safe
    1.15
    safe
    1.09
    safety
    1.09
     Safety
    1.08
     SAFETY
    1.08
    SAFE
    1.05
     SAFE
    1.05
     safety
    1.04
    Act Density 0.068%

    No Known Activations