INDEX
    Explanations

    proper nouns, particularly names of individuals and organizations

    New Auto-Interp
    Negative Logits
    lingen
    -0.16
    lymp
    -0.16
    stown
    -0.15
    ród
    -0.15
    adelphia
    -0.15
    atown
    -0.14
    ervised
    -0.14
     governing
    -0.14
    orama
    -0.14
    ợ
    -0.14
    POSITIVE LOGITS
    iaux
    0.18
     CONTRIBUT
    0.18
    aint
    0.17
    ides
    0.17
     III
    0.15
    essian
    0.15
    ä¸ī级
    0.15
    avage
    0.15
     ìϏ
    0.15
    .gmail
    0.15
    Act Density 0.308%

    No Known Activations