INDEX
    Explanations

    Appearances may be deceptive

    New Auto-Interp
    Negative Logits
     Clamp
    -0.08
     exclusiva
    -0.07
     compartilh
    -0.07
     opio
    -0.07
     klimaat
    -0.07
    .Clamp
    -0.07
     exclusivo
    -0.07
    Clamp
    -0.07
     clamp
    -0.07
     lähe
    -0.07
    POSITIVE LOGITS
     deceptive
    0.19
     decept
    0.16
     superfic
    0.14
     dece
    0.14
     aparent
    0.14
     enga
    0.13
    隐藏
    0.13
     disguised
    0.13
     disguis
    0.13
     deception
    0.13
    Act Density 0.078%

    No Known Activations