INDEX
    Explanations

    proper names, particularly those related to educational and governmental figures

    New Auto-Interp
    Negative Logits
    seau
    -0.18
    avery
    -0.17
    zew
    -0.16
    ardon
    -0.15
    oku
    -0.15
    uya
    -0.15
    lider
    -0.15
    attle
    -0.14
    inski
    -0.14
    Äĥ
    -0.14
    POSITIVE LOGITS
    uni
    0.15
    оÑĢоÑĤ
    0.15
    zk
    0.14
    wayne
    0.14
    eeper
    0.14
     Roma
    0.14
     Den
    0.14
    objc
    0.14
     Mn
    0.13
    207
    0.13
    Act Density 0.081%

    No Known Activations