INDEX
    Explanations

    phrases related to deception and trickery

    various forms of the word "deceive."

    New Auto-Interp
    Negative Logits
     Yor
    -0.71
     Brands
    -0.67
     Bots
    -0.64
    ingen
    -0.63
     Targ
    -0.62
     Aires
    -0.62
     Polo
    -0.62
     Zamb
    -0.60
     Atkinson
    -0.60
    CHA
    -0.59
    POSITIVE LOGITS
    ffect
    1.04
    ither
    0.95
    ptive
    0.93
    pt
    0.92
    emonic
    0.92
    iving
    0.91
    ased
    0.90
    fficient
    0.88
    ivable
    0.87
    astrous
    0.86
    Act Density 0.094%

    No Known Activations