INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.45
    ،
    1.28
     因為
    1.04
    ுள்ளார்
    1.02
     ،
    1.00
    1.00
    0.98
    0.97
     contemplates
    0.96
     لكرة
    0.92
    POSITIVE LOGITS
    i
    1.62
    et
    1.58
    ig
    1.32
    o
    1.29
    as
    1.28
    d
    1.27
    ib
    1.23
    ي
    1.22
    n
    1.20
    etre
    1.18
    Act Density 0.063%

    No Known Activations