INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     (
    1.04
     It
    0.81
     A
    0.80
    দিন
    0.71
     אחד
    0.69
    A
    0.68
     Isabelle
    0.66
    haha
    0.65
     exposé
    0.64
    ද්ධ
    0.64
    POSITIVE LOGITS
    1.20
    1.17
     in
    1.15
    1.14
    1.09
     σε
    1.03
     در
    0.93
    ز
    0.90
     في
    0.89
    もら
    0.89
    Act Density 1.037%

    No Known Activations