INDEX
    Explanations

    derogatory and inflammatory language towards specific groups or individuals

    negatively charged insults and criticism

    New Auto-Interp
    Negative Logits
    انجليز
    -0.43
    Tikang
    -0.40
    afficheront
    -0.38
    tasse
    -0.36
     hemd
    -0.34
     sonst
    -0.34
     Cycles
    -0.34
    ensement
    -0.33
     Laramie
    -0.33
     Ghent
    -0.33
    POSITIVE LOGITS
     utafitiHapana
    0.55
    AddTagHelper
    0.53
     <<<<<<<<<<<<<<
    0.48
     fucking
    0.45
    TagMode
    0.43
    __(/*!
    0.43
    windowFixed
    0.42
     referrerpolicy
    0.42
     damned
    0.42
    SerializedSize
    0.41
    Act Density 0.297%

    No Known Activations