INDEX
Explanations
mentions of physical or mental disabilities
references to disabilities and handicaps
New Auto-Interp
Negative Logits
liness
-0.87
ership
-0.87
achusetts
-0.69
lling
-0.68
Scully
-0.67
IRD
-0.67
rence
-0.67
ling
-0.66
AK
-0.66
berman
-0.65
POSITIVE LOGITS
ities
0.90
astics
0.88
umbered
0.81
phant
0.81
aments
0.78
agi
0.75
cycles
0.71
traged
0.69
ounter
0.69
asia
0.68
Activations Density 0.066%