INDEX
    Explanations

    phrases related to luring or baiting

    language related to deception or entrapment

    New Auto-Interp
    Negative Logits
    urity
    -0.82
    blance
    -0.73
    oret
    -0.69
    yrus
    -0.69
    eas
    -0.69
    olitan
    -0.68
    ppard
    -0.67
    eely
    -0.66
    ias
    -0.65
    iator
    -0.65
    POSITIVE LOGITS
     lure
    1.08
     bait
    1.03
    ument
    0.96
    mong
    0.90
    glers
    0.84
    EStream
    0.82
     Wag
    0.78
    GGGGGGGG
    0.76
    crow
    0.71
    ãĥĦ
    0.70
    Act Density 0.036%

    No Known Activations