INDEX
    Explanations

    words related to personal names or identities

    New Auto-Interp
    Negative Logits
     Ν
    -0.79
     N
    -0.72
     NE
    -0.72
     Ne
    -0.70
     Ն
    -0.68
    reportWebVitals
    -0.65
    styleType
    -0.63
     NA
    -0.63
     Na
    -0.62
     Ни
    -0.61
    POSITIVE LOGITS
    n
    1.37
    nn
    1.25
    na
    1.25
    nen
    1.22
    nos
    1.17
    ned
    1.17
    nan
    1.16
    ne
    1.16
    nal
    1.15
    nas
    1.14
    Act Density 0.292%

    No Known Activations