INDEX
    Explanations

    proper nouns, particularly names of individuals and titles

    New Auto-Interp
    Negative Logits
    aju
    -0.16
    ined
    -0.14
    ses
    -0.14
     chaired
    -0.14
    ocs
    -0.14
     coincidence
    -0.14
    abr
    -0.13
    cha
    -0.13
    odel
    -0.13
    ij
    -0.13
    POSITIVE LOGITS
     is
    0.19
     began
    0.18
    æĺ¯ä¸Ģ
    0.17
    unsch
    0.17
     isa
    0.17
     born
    0.16
     adalah
    0.16
     earned
    0.16
     became
    0.16
     æĺ¯
    0.16
    Act Density 0.086%

    No Known Activations