INDEX
    Explanations

    phrases related to safety and security

    references to safety in various contexts

    New Auto-Interp
    Negative Logits
    dx
    -0.72
    yi
    -0.69
     Fiber
    -0.68
    ordan
    -0.67
    issance
    -0.67
    frey
    -0.66
    essee
    -0.65
    attr
    -0.64
    hour
    -0.63
    bender
    -0.63
    POSITIVE LOGITS
     safe
    1.12
     Safe
    0.89
    safe
    0.84
     havens
    0.84
     evacuation
    0.80
     safest
    0.79
     safely
    0.79
     safer
    0.77
     Haram
    0.76
    saf
    0.76
    Act Density 0.013%

    No Known Activations