INDEX
    Explanations

    proper nouns, particularly names and titles

    New Auto-Interp
    Negative Logits
    awan
    -0.15
    anke
    -0.15
    arness
    -0.15
    oš
    -0.15
     paren
    -0.14
    osy
    -0.14
    olio
    -0.14
    elon
    -0.14
    ãĤ
    -0.14
    folio
    -0.14
    POSITIVE LOGITS
    aiser
    0.16
    zcze
    0.15
    ober
    0.15
    ãĥ¼ãĥª
    0.14
    atorial
    0.14
    itant
    0.14
    ZH
    0.14
    WSTR
    0.14
    darwin
    0.14
    ÏĦιÏĥ
    0.14
    Act Density 0.075%

    No Known Activations