INDEX
    Explanations

    terms related to social or political power dynamics and their implications

    New Auto-Interp
    Negative Logits
    -0.45
    off
    -0.42
    -0.42
    aarrggbb
    -0.41
    ve
    -0.41
     bata
    -0.40
    TextInputLayout
    -0.38
    def
    -0.38
    teau
    -0.37
    -0.36
    POSITIVE LOGITS
     endforeach
    0.82
    uidado
    0.79
    rítica
    0.77
    };*/
    0.76
    yship
    0.74
     تانيه
    0.73
     >=",
    0.73
    ]='\
    0.71
    Còn
    0.71
    liothèque
    0.69
    Act Density 0.619%

    No Known Activations