INDEX
    Explanations

    terms related to race and societal issues

    racial epithets and slurs

    New Auto-Interp
    Negative Logits
    ButterKnife
    -0.39
     operacional
    -0.38
    operational
    -0.36
     operational
    -0.36
     emocion
    -0.36
     vyš
    -0.35
     esperamos
    -0.35
    RTLI
    -0.34
     Concorde
    -0.34
     transporte
    -0.34
    POSITIVE LOGITS
     betweenstory
    0.53
     препратки
    0.51
    0.48
    ooga
    0.45
     resourceCulture
    0.44
    0.44
    addGap
    0.43
    modb
    0.42
     szóci
    0.42
    𝒯
    0.42
    Act Density 0.026%

    No Known Activations