INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Dense
    -0.07
    ugi
    -0.06
     celib
    -0.06
    .baseUrl
    -0.06
    application
    -0.06
     reportedly
    -0.06
     Nikola
    -0.06
    .Configuration
    -0.06
     Ki
    -0.06
    Ki
    -0.06
    POSITIVE LOGITS
     unfair
    0.09
     fair
    0.08
     Fair
    0.08
    fair
    0.07
    0.07
    pair
    0.07
    210
    0.07
    líd
    0.07
     fairness
    0.07
    campaign
    0.06
    Act Density 0.011%

    No Known Activations