INDEX
Explanations
terminology related to social justice and historical injustices
New Auto-Interp
Negative Logits
isplay
-0.16
eeper
-0.15
eyond
-0.15
Terr
-0.15
ee
-0.15
mÄĽ
-0.14
lsen
-0.14
following
-0.14
ccion
-0.14
umb
-0.14
POSITIVE LOGITS
.scalablytyped
0.18
fitte
0.15
ordum
0.14
Ïģαβ
0.14
ifetime
0.14
ëŀĺ
0.14
ãĥ©ãĤ¹
0.14
obe
0.14
rops
0.14
enden
0.14
Activations Density 0.030%