INDEX
    Explanations

    references to advantages or positive outcomes

    New Auto-Interp
    Negative Logits
     Lw
    -0.72
     שוליים
    -0.68
    dymyr
    -0.66
    betical
    -0.63
     Geiger
    -0.63
    yty
    -0.60
     userDao
    -0.60
     ráp
    -0.58
     mkdir
    -0.58
    stom
    -0.58
    POSITIVE LOGITS
     benefits
    2.29
     Benefits
    2.05
     benefit
    2.01
    benefits
    1.89
    Benefits
    1.88
    Benefit
    1.85
     Benefit
    1.85
    benefit
    1.83
     BENEFITS
    1.80
    BENEFITS
    1.69
    Act Density 0.072%

    No Known Activations