INDEX
Explanations
themes related to inequality and marginalization
New Auto-Interp
Negative Logits
chia
-0.16
iju
-0.16
illow
-0.15
afil
-0.15
ãĥ³ãĤ¯
-0.15
ijkstra
-0.14
rego
-0.14
acker
-0.14
avou
-0.14
anka
-0.14
POSITIVE LOGITS
discrim
0.34
marginal
0.33
mist
0.32
treated
0.31
excluded
0.30
marg
0.28
malt
0.28
ignored
0.28
-treated
0.26
left
0.25
Activations Density 0.219%