INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    groupBox
    -0.07
     destac
    -0.07
     міг
    -0.07
    -haspopup
    -0.07
     sigue
    -0.07
     almak
    -0.06
    -0.06
    uncios
    -0.06
     масло
    -0.06
     birinci
    -0.06
    POSITIVE LOGITS
    uth
    0.06
     Hubb
    0.06
     Jen
    0.06
    teacher
    0.06
     Angle
    0.06
     stra
    0.06
    .white
    0.06
    0.06
    learn
    0.06
     Williamson
    0.06
    Act Density 0.127%

    No Known Activations