INDEX
Explanations
discussions around political structures and representation of marginalized communities
New Auto-Interp
Negative Logits
ungal
-0.17
èĤ²
-0.17
ADED
-0.16
desn
-0.16
olid
-0.16
ativo
-0.16
373
-0.15
atten
-0.15
itters
-0.14
173
-0.14
POSITIVE LOGITS
minorities
0.39
minority
0.39
minor
0.35
Minor
0.35
Minor
0.35
minor
0.32
Minority
0.30
marg
0.28
vulnerable
0.26
marginalized
0.26
Activations Density 0.145%