INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .people
    -0.07
     potatoes
    -0.07
     покры
    -0.07
    theory
    -0.07
     Powerful
    -0.07
    _task
    -0.06
    ेड
    -0.06
     Пет
    -0.06
     deleted
    -0.06
    ücret
    -0.06
    POSITIVE LOGITS
    кин
    0.07
     stencil
    0.07
    ENCIL
    0.07
    (substr
    0.06
    olist
    0.06
    struction
    0.06
    loe
    0.06
    ostel
    0.06
    Stencil
    0.06
     ):↵
    0.06
    Act Density 0.001%

    No Known Activations