INDEX
    Explanations

    references to historical or geographical locations

    New Auto-Interp
    Negative Logits
    avez
    -0.16
    zure
    -0.16
    zan
    -0.16
    zman
    -0.15
    @nate
    -0.15
    léd
    -0.15
    ãĥ¼ãĥĦ
    -0.15
    ائب
    -0.14
    vak
    -0.14
    zee
    -0.14
    POSITIVE LOGITS
    aph
    0.31
    azy
    0.31
    ushed
    0.29
    allowed
    0.26
    ith
    0.24
    irs
    0.23
    istr
    0.22
    asty
    0.22
    ards
    0.22
    odge
    0.22
    Act Density 0.012%

    No Known Activations