INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ih
    -0.08
     favors
    -0.08
     उसे
    -0.07
     بین
    -0.07
     Bac
    -0.07
    .IC
    -0.07
     يد
    -0.07
     HQ
    -0.07
     даль
    -0.07
    -0.07
    POSITIVE LOGITS
    Sec
    0.07
     baptism
    0.07
     ndu
    0.07
    891
    0.07
    truct
    0.07
     lum
    0.07
    473
    0.06
    BEST
    0.06
     Además
    0.06
     seg
    0.06
    Act Density 0.037%

    No Known Activations