INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     confirmations
    -0.08
     кис
    -0.08
     ultrap
    -0.08
     equation
    -0.08
     pessoais
    -0.08
    ñe
    -0.08
     люд
    -0.08
     insanlar
    -0.08
    Confirm
    -0.08
     গেলে
    -0.07
    POSITIVE LOGITS
     camouflage
    0.15
    ouflage
    0.12
    oufl
    0.10
     deception
    0.09
    Matching
    0.09
    embedding
    0.09
     Matching
    0.09
     Match
    0.09
    Colors
    0.09
     vivid
    0.09
    Act Density 0.005%

    No Known Activations