INDEX
Explanations
phrases indicating inclusivity or diversity across various categories
New Auto-Interp
Negative Logits
avin
-0.17
inen
-0.17
á»ĵng
-0.15
byn
-0.15
usted
-0.14
reater
-0.14
osg
-0.14
oste
-0.14
ideon
-0.14
je
-0.14
POSITIVE LOGITS
alam
0.16
ırak
0.14
Ïĩ
0.14
Unchecked
0.14
ToPoint
0.14
å»
0.14
ãĤŃãĥ³ãĤ°
0.14
кÑĥл
0.13
dad
0.13
ãĤ¾
0.13
Activations Density 0.023%