INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    الت
    0.74
    О
    0.73
    cout
    0.66
    0.66
    Src
    0.65
    aju
    0.65
    اه
    0.64
    یشه
    0.63
    cd
    0.63
    которые
    0.63
    POSITIVE LOGITS
     on
    0.81
     as
    0.80
    f
    0.77
     of
    0.75
     or
    0.73
     intake
    0.72
     bebidas
    0.70
    ra
    0.70
    it
    0.69
    es
    0.69
    Act Density 0.093%

    No Known Activations