INDEX
Explanations
words related to personal names or identities
New Auto-Interp
Negative Logits
Ν
-0.79
N
-0.72
NE
-0.72
Ne
-0.70
Ն
-0.68
reportWebVitals
-0.65
styleType
-0.63
NA
-0.63
Na
-0.62
Ни
-0.61
POSITIVE LOGITS
n
1.37
nn
1.25
na
1.25
nen
1.22
nos
1.17
ned
1.17
nan
1.16
ne
1.16
nal
1.15
nas
1.14
Activations Density 0.292%