INDEX
    Explanations

    negative implications or consequences associated with various situations or actions

    New Auto-Interp
    Negative Logits
    ulhu
    -0.76
     Flavoring
    -0.72
    iage
    -0.71
     Transparency
    -0.68
     deduction
    -0.67
     clause
    -0.67
     SHARES
    -0.66
     Curve
    -0.65
     tweet
    -0.64
    vernment
    -0.64
    POSITIVE LOGITS
    compatible
    1.18
    enough
    1.11
    eligible
    1.08
    aware
    1.02
    focused
    1.01
    tested
    1.01
    producing
    0.98
    eyed
    0.98
    dependent
    0.94
    years
    0.94
    Act Density 0.074%

    No Known Activations