INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Rank
    -0.07
    Wars
    -0.06
    ncy
    -0.06
     Radius
    -0.06
     Snowden
    -0.06
     Shop
    -0.06
    }};↵
    -0.06
    .Field
    -0.06
    nish
    -0.06
    view
    -0.06
    POSITIVE LOGITS
     permet
    0.09
    ray
    0.07
    817
    0.06
     obrig
    0.06
     orig
    0.06
     придется
    0.06
     crit
    0.06
     trag
    0.06
     metab
    0.06
     треть
    0.06
    Act Density 0.003%

    No Known Activations