INDEX
    Explanations

    biological or social stereotypes

    New Auto-Interp
    Negative Logits
     appar
    0.44
    Automatically
    0.44
    自动
    0.42
     decorations
    0.42
     Earrings
    0.42
     automaticamente
    0.42
     Physiology
    0.41
    传播
    0.41
     غالب
    0.40
    enses
    0.39
    POSITIVE LOGITS
     زیرمه
    0.58
    ТИ
    0.54
    uza
    0.52
    en
    0.52
     දැන
    0.52
    inicial
    0.51
    -}\
    0.50
     ஒன்றாக
    0.50
     किताब
    0.49
     requerido
    0.49
    Act Density 0.001%

    No Known Activations