INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.26
     około
    1.22
    1.22
     lainnya
    1.19
    休闲
    1.19
    Від
    1.16
     اکثر
    1.16
     partiellement
    1.13
     subconsciously
    1.12
     możesz
    1.11
    POSITIVE LOGITS
    h
    1.42
    are
    1.38
    om
    1.21
    ا
    1.20
    おります
    1.13
    i
    1.12
    have
    1.07
    ен
    1.06
    en
    1.05
    1.05
    Act Density 0.001%

    No Known Activations