INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     d
    0.51
     na
    0.51
     i
    0.50
     naj
    0.49
     system
    0.48
    0.48
    d
    0.47
     ut
    0.47
    ne
    0.47
    u
    0.47
    POSITIVE LOGITS
     Polish
    1.42
    Polish
    1.37
    Poland
    1.37
     Poland
    1.35
     Поль
    1.32
     поль
    1.30
     polish
    1.26
     Pologne
    1.16
    polish
    1.11
     poln
    1.09
    Act Density 0.015%

    No Known Activations