INDEX
    Explanations

    violent or aggressive actions or intentions

    actions related to punishment and control

    New Auto-Interp
    Negative Logits
    ItemTracker
    -0.57
    Downloadha
    -0.56
    ãĤ´ãĥ³
    -0.55
    dating
    -0.53
     dotted
    -0.53
    nih
    -0.52
     Nos
    -0.51
     fashioned
    -0.50
    arij
    -0.50
    ebook
    -0.49
    POSITIVE LOGITS
    uate
    0.80
    ISE
    0.63
    enance
    0.62
    ulate
    0.59
     them
    0.58
    itate
    0.58
     him
    0.58
    igate
    0.58
     oneself
    0.55
     RELEASE
    0.55
    Act Density 0.660%

    No Known Activations