INDEX
    Explanations

    phrases related to derogatory remarks

    instances of the word "sn"

    New Auto-Interp
    Negative Logits
    heid
    -0.87
     limited
    -0.70
     maj
    -0.64
     medium
    -0.62
     und
    -0.61
     Cond
    -0.60
    EMENT
    -0.60
     respectfully
    -0.60
     belts
    -0.59
     Ind
    -0.59
    POSITIVE LOGITS
    iping
    1.44
    uggle
    1.43
    ipe
    1.41
    ugg
    1.41
    appy
    1.37
    atches
    1.35
    atching
    1.35
    arling
    1.34
    ipes
    1.33
    ickers
    1.33
    Act Density 0.033%

    No Known Activations