INDEX
    Explanations

    words indicating actions done with intent or purpose

    terms related to intentional or deliberate actions

    New Auto-Interp
    Negative Logits
     Rite
    -0.85
    busters
    -0.69
     HERO
    -0.67
     ANGEL
    -0.67
     Warriors
    -0.66
     Tycoon
    -0.65
    Rated
    -0.63
    iry
    -0.63
     Colleges
    -0.62
     Emir
    -0.62
    POSITIVE LOGITS
     deliberately
    1.05
     intentionally
    0.96
     planted
    0.92
     purposely
    0.90
     purposefully
    0.85
     reprodu
    0.82
     plotted
    0.82
     disreg
    0.80
     indul
    0.77
     misrepresent
    0.77
    Act Density 0.009%

    No Known Activations