INDEX
Explanations
references to inclusivity or diversity in various contexts
New Auto-Interp
Negative Logits
.ax
-0.14
linkplain
-0.14
ibri
-0.14
Rubin
-0.13
boxed
-0.13
fty
-0.13
ìķł
-0.13
Demir
-0.13
uty
-0.13
lsi
-0.13
POSITIVE LOGITS
stripes
0.18
acles
0.18
ages
0.17
stripe
0.16
isha
0.15
ensch
0.15
isser
0.15
sorts
0.14
age
0.14
573
0.14
Activations Density 0.022%