INDEX
    Explanations

    expressions of dialogue, particularly those indicating emphasis or strong feelings

    speak about individuals or groups in a derogatory or condescending manner

    New Auto-Interp
    Negative Logits
     Seym
    -0.75
     mathemat
    -0.69
     tabloid
    -0.68
     limb
    -0.68
     ivory
    -0.68
     seiz
    -0.67
     Gardens
    -0.67
     amusement
    -0.66
     metic
    -0.66
     tasting
    -0.66
    POSITIVE LOGITS
    ï¸ı
    1.06
    rd
    0.99
    lean
    0.95
    resent
    0.93
    deg
    0.89
    vernment
    0.89
    ï¸
    0.85
    PB
    0.84
    audi
    0.84
    ãĥĥãĥī
    0.84
    Act Density 0.030%

    No Known Activations