INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     by
    1.45
     in
    1.37
     The
    1.22
    t
    1.20
     of
    1.17
    p
    1.17
    م
    1.15
    1
    1.11
    1.11
     was
    1.10
    POSITIVE LOGITS
    ir
    1.24
    1.12
    -
    0.92
    be
    0.90
    ot
    0.86
     dàng
    0.83
    이나
    0.82
    it
    0.82
    0.81
     می‌توانید
    0.80
    Act Density 0.002%

    No Known Activations