INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    2.15
    taxi
    2.14
    r
    1.97
    ques
    1.89
     също
    1.87
    or
    1.79
     fundo
    1.77
     вида
    1.77
     оно
    1.74
     telles
    1.73
    POSITIVE LOGITS
    ه
    2.42
    2.23
    cknowled
    2.22
    воен
    2.21
    2.11
    на
    2.10
    2.06
     Pertama
    2.06
    ب
    2.05
    2.04
    Act Density 0.067%

    No Known Activations