INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ди
    0.79
    0.70
    ك
    0.70
    خ
    0.69
    ڈ
    0.66
    ية
    0.64
    ul
    0.64
    مي
    0.64
    0.64
    ق
    0.64
    POSITIVE LOGITS
    ).
    0.59
    的同时
    0.59
     ethnicities
    0.59
        
    0.55
     Of
    0.53
     ethnicity
    0.51
     Preventing
    0.51
     <
    0.50
     Racial
    0.50
     Detecting
    0.49
    Act Density 0.022%

    No Known Activations