INDEX
    Explanations

    proper nouns, particularly names and places

    New Auto-Interp
    Negative Logits
    tik
    -0.17
    bedo
    -0.15
    ulled
    -0.15
    quia
    -0.15
    platz
    -0.15
    keh
    -0.14
    agli
    -0.14
    à¸Ĭà¸Ļ
    -0.14
    raz
    -0.14
     SYS
    -0.14
    POSITIVE LOGITS
    erson
    0.31
    ley
    0.29
    son
    0.28
    ford
    0.27
    ston
    0.27
    lington
    0.27
    ington
    0.26
    ison
    0.25
    ton
    0.25
    field
    0.25
    Act Density 0.228%

    No Known Activations