INDEX
    Explanations

    words related to corruption or unfair practices

    references to corruption in various contexts

    New Auto-Interp
    Negative Logits
    abwe
    -0.71
     Flavoring
    -0.70
    cule
    -0.66
    zig
    -0.66
     Anxiety
    -0.65
    ciation
    -0.65
    gap
    -0.65
    Downloadha
    -0.64
    yip
    -0.63
    ches
    -0.63
    POSITIVE LOGITS
    ly
    1.15
    ible
    1.02
    ions
    1.00
    nesses
    0.94
    NESS
    0.94
    ness
    0.91
    ibly
    0.88
    ingly
    0.85
    glers
    0.83
    iated
    0.80
    Act Density 0.040%

    No Known Activations