INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ED
    1.31
    ке
    1.27
     którzy
    1.27
    ش
    1.25
    ed
    1.24
    ει
    1.22
    amp
    1.18
    во
    1.14
    1.14
    ][%
    1.08
    POSITIVE LOGITS
    s
    2.11
    sr
    1.55
    ς
    1.50
    mselves
    1.48
    sning
    1.45
    1.42
    1.37
    ness
    1.34
    g
    1.32
    1.31
    Act Density 0.310%

    No Known Activations