INDEX
    Explanations

    references to nationality or citizenship, particularly American and English identities

    New Auto-Interp
    Negative Logits
    inspace
    -0.16
    zburg
    -0.16
    bate
    -0.15
    oyer
    -0.14
    vinc
    -0.14
    WithType
    -0.14
     Ade
    -0.14
    cord
    -0.14
    atk
    -0.14
    cdecl
    -0.14
    POSITIVE LOGITS
    avern
    0.18
    alth
    0.15
    ADDE
    0.15
    ukan
    0.14
    bench
    0.14
    κÏĮ
    0.13
    ran
    0.13
    ká
    0.13
    aven
    0.13
    appen
    0.13
    Act Density 0.015%

    No Known Activations