INDEX
    Explanations

    words related to deception and disguises

    references to illusions and disguises

    New Auto-Interp
    Negative Logits
    uled
    -0.72
    aldi
    -0.69
    orough
    -0.69
    Ü
    -0.66
    capacity
    -0.65
    vez
    -0.64
    acid
    -0.63
     Issues
    -0.63
    olved
    -0.62
    iaries
    -0.62
    POSITIVE LOGITS
     deceive
    1.10
     disgu
    1.00
     mir
    0.94
     illusion
    0.90
     deception
    0.85
     disguise
    0.84
     camoufl
    0.84
     pas
    0.82
    querade
    0.81
     Illusion
    0.80
    Act Density 0.070%

    No Known Activations