INDEX
    Explanations

    mentions of "dirty" or morally questionable actions/concepts

    references to "dirty" or unethical practices and behaviors

    New Auto-Interp
    Negative Logits
    */(
    -1.20
    istically
    -0.82
    itech
    -0.80
    XT
    -0.78
    aic
    -0.77
    isol
    -0.76
    HCR
    -0.75
    ãĥĦ
    -0.74
    izations
    -0.73
    uther
    -0.73
    POSITIVE LOGITS
     laundry
    1.23
     tricks
    1.08
     linen
    1.06
     diapers
    0.97
     rotten
    0.93
     dirty
    0.92
     diaper
    0.88
     luc
    0.86
     trick
    0.85
     dishes
    0.83
    Act Density 0.075%

    No Known Activations