INDEX
Explanations
terms related to social and political issues impacting marginalized communities
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.07
3:0.20
4:0.11
5:0.04
6:0.03
7:0.05
8:0.06
9:0.12
10:0.13
11:0.07
Negative Logits
esty
-1.54
orget
-1.33
[+
-1.31
eneg
-1.25
glim
-1.21
epad
-1.19
NRS
-1.16
appropriately
-1.15
gow
-1.11
itaire
-1.11
POSITIVE LOGITS
.).
1.80
)).
1.69
).[
1.64
]).
1.60
zinski
1.49
respectively
1.41
?).
1.41
Slav
1.34
Vald
1.32
武
1.28
Activations Density 1.207%