INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1
    0.45
    ك
    0.35
     and
    0.32
    the
    0.31
    9
    0.31
    কে
    0.29
    NE
    0.29
    at
    0.28
    و
    0.28
    на
    0.28
    POSITIVE LOGITS
    িনবার্গ
    0.29
    أة
    0.29
     médicament
    0.29
    állítás
    0.27
     soirée
    0.26
    0.25
     mediator
    0.25
     gambler
    0.25
    addassa
    0.25
     nargs
    0.25
    Act Density 0.462%

    No Known Activations