INDEX
    Explanations

    specific names and proper nouns across different languages

    Tokens preceding non-English text

    foreign language beginnings

    New Auto-Interp
    Negative Logits
    ,
    -0.70
    -0.70
    (
    -0.67
    :
    -0.67
    .
    -0.66
    !
    -0.64
    1
    -0.62
    <eos>
    -0.62
    ;
    -0.62
     (
    -0.61
    POSITIVE LOGITS
    ſelves
    1.16
     Мексичка
    1.12
    NUMX
    1.10
    ouſly
    1.08
    ſelf
    1.08
    ghijklmnop
    1.06
    geſ
    1.04
     ERSITY
    1.00
    leſs
    1.00
    eſt
    0.98
    Act Density 0.060%

    No Known Activations