INDEX
    Explanations

    words related to things that are considered inappropriate, shocking, or upsetting

    instances of the word "offensive" in various contexts

    New Auto-Interp
    Negative Logits
    chell
    -0.96
    Deal
    -0.77
    aret
    -0.77
    omed
    -0.72
    ho
    -0.71
    omething
    -0.71
    perature
    -0.71
     Cind
    -0.67
    bourg
    -0.66
    luck
    -0.66
    POSITIVE LOGITS
     offensive
    0.77
    ity
    0.75
     thrust
    0.72
     posture
    0.71
    thouse
    0.70
     linemen
    0.67
    ities
    0.67
     against
    0.64
     guessing
    0.64
     contraception
    0.63
    Act Density 0.023%

    No Known Activations