INDEX
    Explanations

    prejudiced discrimination

    New Auto-Interp
    Negative Logits
     to
    0.73
    0.73
    nabla
    0.68
    ă
    0.64
    0.63
    ienes
    0.62
     যেটি
    0.61
    0.61
    |^{
    0.60
    !
    0.59
    POSITIVE LOGITS
    ي
    1.05
    in
    1.01
    ת
    1.00
    el
    0.99
    י
    0.93
    ad
    0.91
    at
    0.91
    та
    0.90
    0.85
    us
    0.84
    Act Density 0.070%

    No Known Activations