INDEX
    Explanations

    avoiding negative outcomes

    New Auto-Interp
    Negative Logits
    _smart
    -0.07
    ipment
    -0.07
    führt
    -0.06
     denom
    -0.06
     SEXP
    -0.06
    										
    -0.06
    orst
    -0.06
    literal
    -0.06
    -0.06
    ेद
    -0.06
    POSITIVE LOGITS
     taj
    0.06
    結果
    0.06
     favorable
    0.06
    دمة
    0.06
    interest
    0.06
     puss
    0.06
     коли
    0.06
     optimism
    0.06
    
    0.06
     Compatibility
    0.06
    Act Density 0.098%

    No Known Activations