INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    an
    1.38
    t
    1.28
     to
    1.26
    1.21
     a
    1.18
    A
    1.13
    im
    1.11
    int
    1.11
    in
    1.01
    to
    1.01
    POSITIVE LOGITS
    лы
    1.22
    ер
    1.11
    є
    0.96
    も含
    0.95
     
    0.95
     dangereux
    0.92
     없다
    0.92
    ри
    0.90
    рити
    0.89
    0.89
    Act Density 0.000%

    No Known Activations