INDEX
    Explanations

    occurrences of words related to swearing or vulgar expressions

    New Auto-Interp
    Negative Logits
    νή
    -0.16
    idis
    -0.16
    urer
    -0.16
    raf
    -0.16
    apsed
    -0.16
    κÏģι
    -0.16
    inement
    -0.15
    unte
    -0.15
    bris
    -0.15
    echn
    -0.15
    POSITIVE LOGITS
    stakes
    0.21
    ies
    0.17
    endor
    0.16
    artz
    0.16
    ombat
    0.16
    ollen
    0.16
    itzer
    0.16
    enburg
    0.15
    sock
    0.15
    enberg
    0.15
    Act Density 0.056%

    No Known Activations