INDEX
    Explanations

    terms related to safety in various contexts

    New Auto-Interp
    Negative Logits
     cheminée
    -0.40
     embreagem
    -0.40
     chaqueta
    -0.40
     derra
    -0.39
     vom
    -0.39
     written
    -0.38
    Multiplier
    -0.38
     temporary
    -0.38
     joining
    -0.38
    Written
    -0.38
    POSITIVE LOGITS
     Safety
    1.04
    Safety
    1.00
    safety
    0.98
     SAFETY
    0.92
     safety
    0.90
    afety
    0.86
    SAFETY
    0.80
    安全
    0.64
     veiligheid
    0.63
    SAFE
    0.60
    Act Density 0.005%

    No Known Activations