INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     as
    1.13
    et
    1.03
     a
    0.94
     
    0.86
     and
    0.84
    он
    0.82
    ه
    0.82
    G
    0.79
    a
    0.77
    A
    0.77
    POSITIVE LOGITS
     mainly
    1.17
     głównie
    1.14
     principalement
    1.11
     hauptsächlich
    1.03
    mainly
    0.98
     Mainly
    0.98
     principalmente
    0.94
    u
    0.91
    та
    0.87
     primarily
    0.87
    Act Density 0.055%

    No Known Activations