INDEX
    Explanations

    mentions of things being imitated or imitating something else

    words related to being "immoral" or "immorality."

    New Auto-Interp
    Negative Logits
    OPLE
    -0.68
     Morales
    -0.67
    escription
    -0.67
    Downloadha
    -0.66
     Sack
    -0.66
     Rav
    -0.65
    NetMessage
    -0.65
    ttes
    -0.63
     Wales
    -0.62
    ij士
    -0.61
    POSITIVE LOGITS
    itating
    1.15
    balanced
    1.14
    mer
    1.12
    manent
    1.08
    itates
    1.08
    itations
    1.06
    itated
    1.05
    mers
    1.00
    bal
    0.98
    bec
    0.93
    Act Density 0.021%

    No Known Activations