INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    аж
    -0.08
    ρισ
    -0.06
    autor
    -0.06
    .period
    -0.06
    (at
    -0.06
     Mim
    -0.06
    -0.06
    LOAD
    -0.06
    ammo
    -0.06
    celed
    -0.06
    POSITIVE LOGITS
     annonce
    0.07
    0.07
     используют
    0.06
     Skywalker
    0.06
    0.06
     cool
    0.06
     Table
    0.06
     Churchill
    0.06
     practicing
    0.06
    __,↵
    0.06
    Act Density 0.031%

    No Known Activations