INDEX
    Explanations

    phrases indicating capability or ability to perform actions

    New Auto-Interp
    Negative Logits
    ur
    -0.17
    indent
    -0.15
    osphere
    -0.15
    onta
    -0.15
     fair
    -0.15
    viso
    -0.14
    اÙĦع
    -0.14
    sen
    -0.14
     Ney
    -0.14
    ãĤ§
    -0.14
    POSITIVE LOGITS
    hazi
    0.16
    endez
    0.15
    ivant
    0.15
    veyor
    0.15
    aims
    0.14
    ifikasi
    0.14
    ehir
    0.14
    _PUS
    0.14
    oppins
    0.14
    rosse
    0.13
    Act Density 0.037%

    No Known Activations