INDEX
    Explanations

    names of researchers and their affiliations

    New Auto-Interp
    Negative Logits
    bai
    -0.14
     Lump
    -0.14
    933
    -0.14
    uja
    -0.14
    ฯ
    -0.13
    hone
    -0.13
    .CO
    -0.13
    oyer
    -0.13
    urities
    -0.13
    Longrightarrow
    -0.13
    POSITIVE LOGITS
     et
    0.21
     Orc
    0.18
    .Department
    0.15
     çŃī
    0.14
    Department
    0.14
     Department
    0.14
    ãĤī
    0.14
    ocene
    0.14
     abstraction
    0.14
    ler
    0.14
    Act Density 0.102%

    No Known Activations