INDEX
    Explanations

    words and phrases that are politically charged or provocative, sometimes related to racial issues or stereotypes.

    internet content

    New Auto-Interp
    Negative Logits
    migrationBuilder
    -0.73
     jajaja
    -0.63
    principalColumn
    -0.60
    tagHelperRunner
    -0.60
     فريبيس
    -0.60
    ConstraintMaker
    -0.59
     مرئيه
    -0.59
     romantique
    -0.58
    WithMany
    -0.56
    ?!!
    -0.56
    POSITIVE LOGITS
     Мексичка
    0.54
     cal
    0.54
    cessite
    0.50
    UTRAL
    0.50
    eeper
    0.48
    Personensuche
    0.46
     Ar
    0.46
    getWriter
    0.46
     ar
    0.45
     pure
    0.45
    Act Density 2.524%

    No Known Activations