INDEX
    Explanations

    terms related to intentional actions or behaviors

    occurrences of the words "deliberate" and "intentional."

    New Auto-Interp
    Negative Logits
    WB
    -0.73
    href
    -0.70
    asta
    -0.68
    Rated
    -0.68
     Tycoon
    -0.68
     Kinnikuman
    -0.67
    amy
    -0.67
     Neighbor
    -0.67
    models
    -0.66
    ĻĤ
    -0.65
    POSITIVE LOGITS
     deliberate
    1.05
     intentional
    0.90
    ãĥĥãĤ¯
    0.77
     deliber
    0.73
    theless
    0.73
     foul
    0.71
     disson
    0.70
     attempt
    0.67
     wrongdoing
    0.66
     drift
    0.65
    Act Density 0.011%

    No Known Activations