INDEX
    Explanations

    words related to negative actions or characteristics, specifically focusing on petty behavior

    references to trivial or minor offenses

    New Auto-Interp
    Negative Logits
    Downloadha
    -0.89
    ahead
    -0.78
    hov
    -0.77
    ioch
    -0.76
    hetti
    -0.76
    ources
    -0.76
    avez
    -0.74
    heimer
    -0.73
    Recomm
    -0.73
    igslist
    -0.72
    POSITIVE LOGITS
     petty
    0.91
    cipled
    0.79
     theft
    0.77
     arithmetic
    0.72
     Petty
    0.72
     misdemeanor
    0.72
     provocation
    0.67
     tru
    0.66
     fel
    0.66
     Theft
    0.66
    Act Density 0.014%

    No Known Activations