INDEX
    Explanations

    terms and phrases related to safety

    New Auto-Interp
    Negative Logits
     **)
    -0.81
     Protector
    -0.75
     McCulloch
    -0.75
     protection
    -0.74
     Protect
    -0.74
    ựng
    -0.74
     Marcello
    -0.73
    drück
    -0.73
    følgelig
    -0.73
     PROTECTION
    -0.72
    POSITIVE LOGITS
     Safe
    1.11
     unsafe
    1.10
     SAFE
    1.10
     safely
    1.09
     safe
    1.09
     safer
    1.05
    unsafe
    1.05
     safest
    1.03
    SAFE
    1.02
    Safe
    1.01
    Act Density 0.026%

    No Known Activations