INDEX
    Explanations

    instances of specific punctuation and formatting related to names and places

    New Auto-Interp
    Negative Logits
    даÑĤ
    -0.16
    auss
    -0.15
    олов
    -0.15
    ivre
    -0.14
    relation
    -0.14
    Relation
    -0.14
     Relation
    -0.14
    ront
    -0.14
    Transition
    -0.14
    hte
    -0.13
    POSITIVE LOGITS
    alian
    0.20
    atz
    0.19
    ús
    0.18
    olor
    0.18
    ún
    0.18
    illa
    0.17
    ú
    0.17
    oce
    0.16
    .XR
    0.16
    aving
    0.16
    Act Density 0.002%

    No Known Activations