INDEX
    Explanations

    proper nouns, especially names of individuals

    New Auto-Interp
    Negative Logits
    abant
    -0.17
     kho
    -0.16
    inium
    -0.15
    ceph
    -0.15
    loo
    -0.15
    زاÙĨ
    -0.15
    .ls
    -0.14
    ilis
    -0.14
    ÙĪØ§ÙĨ
    -0.14
    Scores
    -0.14
    POSITIVE LOGITS
    alfa
    0.20
    åĦĢ
    0.18
    /stretch
    0.15
     Güven
    0.15
    ADO
    0.14
    å±Ĩ
    0.14
    ãĥ«
    0.14
    ONO
    0.14
    dsa
    0.14
     summ
    0.13
    Act Density 0.001%

    No Known Activations