INDEX
    Explanations

    negative expressions or dismissals

    New Auto-Interp
    Negative Logits
    waf
    -0.59
    ticles
    -0.58
    ArgumentParser
    -0.54
    OGND
    -0.53
     lomb
    -0.52
     Bender
    -0.50
    هما
    -0.50
    Népesség
    -0.49
     charbon
    -0.48
     Jacinto
    -0.48
    POSITIVE LOGITS
     not
    1.19
    not
    0.97
     NOT
    0.95
     Not
    0.94
    NOT
    0.91
    0.90
    Not
    0.89
     не
    0.86
     không
    0.79
    ไม่
    0.79
    Act Density 0.151%

    No Known Activations