INDEX
    Explanations

    Gestalt principles

    New Auto-Interp
    Negative Logits
     국민
    -0.06
     Sanders
    -0.06
    çak
    -0.06
     dated
    -0.06
    hire
    -0.06
    _detected
    -0.06
    -0.06
    oğlu
    -0.06
    sweet
    -0.06
    แค
    -0.06
    POSITIVE LOGITS
     cosmetic
    0.07
    _MUX
    0.07
     team
    0.06
    0.06
     setting
    0.06
     small
    0.06
    -specific
    0.06
    .tim
    0.06
     smoothly
    0.06
     monks
    0.06
    Act Density 0.008%

    No Known Activations