INDEX
Explanations
phrases indicating inclusivity or diversity
New Auto-Interp
Negative Logits
ancode
-0.17
ncia
-0.15
tach
-0.15
loom
-0.14
.lesson
-0.14
ax
-0.14
mani
-0.13
aneously
-0.13
abilit
-0.13
ripper
-0.13
POSITIVE LOGITS
sami
0.16
Mattis
0.15
cấp
0.14
sortable
0.13
iya
0.13
843
0.13
Mixed
0.13
دث
0.13
ette
0.13
enburg
0.13
Activations Density 0.070%