INDEX
Explanations
references to political entities and affiliations
nationalities and foreign terms
New Auto-Interp
Negative Logits
autorytatywna
-0.60
surla
-0.53
TagMode
-0.53
ujednoznacz
-0.52
الرياضيه
-0.51
RegressionTest
-0.50
itſelf
-0.50
Wies
-0.47
iffance
-0.47
itself
-0.47
POSITIVE LOGITS
.*")]
0.38
tagHelperRunner
0.38
isComment
0.37
0.36
arrings
0.35
iotensin
0.35
cientos
0.34
mtliche
0.32
🏻
0.32
Peruvian
0.31
Activations Density 0.030%