INDEX
    Explanations

    proper names, particularly those of researchers and contributors in scientific publications

    New Auto-Interp
    Negative Logits
    otte
    -0.15
     rost
    -0.15
    dera
    -0.15
    awl
    -0.14
    etty
    -0.14
    raft
    -0.14
    arian
    -0.13
     tarz
    -0.13
     кал
    -0.13
    ophobia
    -0.13
    POSITIVE LOGITS
     conc
    0.15
    uede
    0.15
    ves
    0.15
    orean
    0.14
    ahoo
    0.14
    ij
    0.14
     ta
    0.14
    lesson
    0.14
    æĭ³
    0.14
     Thi
    0.14
    Act Density 0.292%

    No Known Activations