INDEX
Explanations
disability and discrimination
New Auto-Interp
Negative Logits
飲む
0.41
poner
0.40
clouds
0.39
país
0.39
immort
0.39
paysans
0.39
weet
0.39
poisoned
0.38
slav
0.37
branded
0.37
POSITIVE LOGITS
Disability
1.40
wheelchair
1.39
disability
1.38
инвали
1.33
Disabilities
1.28
disabled
1.23
disabilities
1.22
♿
1.20
assistive
1.19
wheelchairs
1.17
Activations Density 0.139%