INDEX
Explanations
references to social issues, particularly those related to marginalized groups and rights
New Auto-Interp
Negative Logits
llib
-0.16
BarButton
-0.15
WithMany
-0.15
Schwartz
-0.14
Kern
-0.14
asal
-0.14
/******/
-0.14
ÑĢаж
-0.13
OfSize
-0.13
лиÑĪком
-0.13
POSITIVE LOGITS
treatment
0.24
treating
0.23
targeting
0.22
women
0.21
dÃłnh
0.19
treat
0.19
female
0.19
Treatment
0.19
Treatment
0.18
æī±
0.17
Activations Density 0.304%