INDEX
    Explanations

    racial superiority claims

    New Auto-Interp
    Negative Logits
    מ
    1.49
    1.27
    א
    1.16
     powied
    1.13
    };\
    1.09
    т
    1.07
    ت
    1.07
    1.03
    ב
    1.03
    ח
    1.02
    POSITIVE LOGITS
    -
    1.31
    1.21
    .
    1.09
    u
    1.02
     superior
    0.97
    ar
    0.93
    ang
    0.91
    atma
    0.90
    ill
    0.89
    arı
    0.89
    Act Density 0.003%

    No Known Activations