INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    1.49
    -
    1.30
     I
    1.22
     (
    1.20
     in
    1.11
    oi
    1.11
    m
    1.07
     )
    1.05
    k
    1.02
     A
    1.02
    POSITIVE LOGITS
    у
    1.90
    ко
    1.73
    то
    1.73
    р
    1.69
    ти
    1.66
    си
    1.52
    ди
    1.48
    й
    1.30
    ан
    1.29
    о
    1.27
    Act Density 0.001%

    No Known Activations