INDEX
Explanations
concepts related to diversity and inclusion
New Auto-Interp
Negative Logits
atak
-0.19
ysi
-0.16
fak
-0.15
.dirty
-0.14
à¤Łà¤ķ
-0.14
Dirt
-0.13
Dirty
-0.13
å¯Ł
-0.13
IVING
-0.13
ntax
-0.13
POSITIVE LOGITS
inclusion
0.45
inclus
0.43
inclusive
0.36
tolerance
0.35
diversity
0.35
inclusive
0.32
clusion
0.31
Diversity
0.29
tol
0.29
equality
0.26
Activations Density 0.265%