INDEX
    Explanations

    expressions indicating denial or negation

    New Auto-Interp
    Negative Logits
     dip
    -0.17
     forg
    -0.17
    ugas
    -0.16
    emap
    -0.16
     Dip
    -0.15
    hem
    -0.15
     Alam
    -0.14
     Gang
    -0.14
     nice
    -0.14
    onz
    -0.14
    POSITIVE LOGITS
    ónico
    0.18
    nodoc
    0.17
    iker
    0.15
    è¶Ĭ
    0.15
    FINITE
    0.15
    DownList
    0.15
    iotic
    0.14
    icer
    0.14
    éļł
    0.14
    iday
    0.14
    Act Density 0.026%

    No Known Activations