INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     президент
    -0.07
    ningar
    -0.06
     beneficial
    -0.06
    -0.06
     dependence
    -0.06
    -0.06
     strategically
    -0.06
     courage
    -0.06
    zo
    -0.06
    Touch
    -0.06
    POSITIVE LOGITS
    minent
    0.08
     Sask
    0.07
     scenery
    0.07
    ASE
    0.07
    FTA
    0.06
     آسی
    0.06
    �m
    0.06
    Parms
    0.06
     Kub
    0.06
    .tc
    0.06
    Act Density 0.002%

    No Known Activations