INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ạp
    -0.06
    ocop
    -0.06
    -0.06
    やる
    -0.06
    uckles
    -0.06
    арод
    -0.06
     friday
    -0.06
    ViewItem
    -0.06
    rons
    -0.06
    Mont
    -0.06
    POSITIVE LOGITS
     beneficiaries
    0.07
     destination
    0.07
    гар
    0.07
     цей
    0.07
     estão
    0.07
    0.07
     تم
    0.07
     abc
    0.06
    (target
    0.06
    =output
    0.06
    Act Density 0.013%

    No Known Activations