INDEX
    Explanations

    warnings against bad behavior

    mentions of wrongdoing or criminal behavior (cheating, theft, arrests, risk/being caught, or requests for illicit advice).

    New Auto-Interp
    Negative Logits
     lng
    -0.07
     такой
    -0.07
    .ob
    -0.07
    uracion
    -0.06
    -0.06
     перес
    -0.06
    ानन
    -0.06
    .xticks
    -0.06
     jov
    -0.06
    -0.06
    POSITIVE LOGITS
     advent
    0.07
    _LIB
    0.06
    Z
    0.06
    DMI
    0.06
    Rpc
    0.06
     numerator
    0.06
    ief
    0.06
     Mu
    0.06
    vented
    0.06
    0.06
    Act Density 0.076%

    No Known Activations