INDEX
    Explanations

    uppercase letters in the text

    New Auto-Interp
    Negative Logits
    mlin
    -0.17
    ru
    -0.17
    yla
    -0.16
    иÑĢа
    -0.16
    io
    -0.15
    ully
    -0.15
    ullet
    -0.15
    yy
    -0.15
    ern
    -0.15
    gnore
    -0.15
    POSITIVE LOGITS
    em
    0.18
    hum
    0.17
     bread
    0.17
    pard
    0.17
    oga
    0.17
    ex
    0.16
    KM
    0.15
    emm
    0.15
    los
    0.15
    rix
    0.15
    Act Density 0.172%

    No Known Activations