INDEX
Explanations
words expressing inclusivity and totality
New Auto-Interp
Negative Logits
rike
-0.16
дина
-0.15
stal
-0.15
ÑĢик
-0.15
aby
-0.15
ÑĮÑĤе
-0.15
daq
-0.14
ietet
-0.14
dale
-0.13
nets
-0.13
POSITIVE LOGITS
errat
0.17
uding
0.17
ende
0.16
usive
0.16
gam
0.15
iles
0.15
ahi
0.15
igator
0.15
ah
0.15
uve
0.15
Activations Density 0.095%