INDEX
Explanations
references to vulnerable and marginalized groups in society
New Auto-Interp
Negative Logits
ONO
-0.16
Guys
-0.15
mate
-0.15
participants
-0.14
ÑĢев
-0.14
detr
-0.14
zens
-0.14
allis
-0.14
IRTH
-0.14
Brake
-0.14
POSITIVE LOGITS
disabled
0.28
disabled
0.25
military
0.24
Disabled
0.23
Disabled
0.23
college
0.22
disables
0.21
_disabled
0.20
DISABLE
0.20
farm
0.20
Activations Density 0.403%