INDEX
    Explanations

    profane and offensive language

    words related to bodily actions and expressions, often with a humorous or vulgar tone

    New Auto-Interp
    Negative Logits
    eting
    -0.69
    eters
    -0.64
     subcontract
    -0.63
    ipers
    -0.62
    eder
    -0.59
     Guards
    -0.59
    ļéĨĴ
    -0.58
    rence
    -0.57
    atu
    -0.57
     ration
    -0.56
    POSITIVE LOGITS
    vana
    0.82
     lucky
    0.71
     bit
    0.69
    licks
    0.68
     darn
    0.67
    THING
    0.66
     messy
    0.66
     unlucky
    0.64
    anked
    0.64
     luck
    0.63
    Act Density 0.316%

    No Known Activations