INDEX
Explanations
themes related to oppression and injustices faced by marginalized groups
New Auto-Interp
Negative Logits
lify
-0.17
iero
-0.15
isphere
-0.14
ãĥ¼ãĥķ
-0.14
uyu
-0.13
pane
-0.13
allen
-0.13
uest
-0.13
aç
-0.13
erno
-0.13
POSITIVE LOGITS
experienced
0.19
suffered
0.18
Experienced
0.17
uous
0.17
/errors
0.17
/problems
0.17
ulence
0.16
visited
0.16
ishment
0.16
/error
0.15
Activations Density 0.117%