INDEX
    Explanations

    strings related to harmful or dangerous activities

    terms associated with health risks and conditions

    New Auto-Interp
    Negative Logits
     Daylight
    -0.63
     Solitaire
    -0.59
    OTOS
    -0.59
    Ctrl
    -0.56
    UTF
    -0.56
    :]
    -0.55
    Priv
    -0.55
     initials
    -0.55
     Sirius
    -0.55
    nih
    -0.53
    POSITIVE LOGITS
    roying
    1.19
    renched
    1.08
    itored
    1.03
    ielding
    0.92
    usting
    0.92
    ASED
    0.90
    iddled
    0.89
    ained
    0.88
    quartered
    0.88
    umping
    0.87
    Act Density 0.091%

    No Known Activations