INDEX
    Explanations

    the presence of specific capitalized names or entities

    New Auto-Interp
    Negative Logits
    urs
    -0.18
    ιÏĩ
    -0.16
    arr
    -0.15
    asal
    -0.15
    uda
    -0.15
    ome
    -0.15
    038
    -0.15
    ادÙĬ
    -0.15
    monic
    -0.15
    ois
    -0.15
    POSITIVE LOGITS
    aylor
    0.20
    ogue
    0.19
    ordin
    0.18
    eria
    0.18
    eri
    0.18
    ayaran
    0.18
    yst
    0.17
    ields
    0.17
    istor
    0.17
    urm
    0.16
    Act Density 0.021%

    No Known Activations