INDEX
    Explanations

    proper nouns, particularly names and places

    New Auto-Interp
    Negative Logits
    ova
    -0.14
     
    -0.14
    ersen
    -0.14
    iaux
    -0.13
     ell
    -0.13
    Anti
    -0.13
    IDE
    -0.13
     typ
    -0.13
     systematic
    -0.13
    rost
    -0.13
    POSITIVE LOGITS
    filt
    0.15
     Sellers
    0.15
    stro
    0.15
    ãĥ«ãĤ¯
    0.14
    hiba
    0.14
    лем
    0.14
    apel
    0.14
    .tie
    0.14
    ãĥ³ãĤ°
    0.14
    inherit
    0.13
    Act Density 0.059%

    No Known Activations