INDEX
Explanations
words related to social welfare programs or benefits
references to welfare and related social programs
New Auto-Interp
Negative Logits
xual
-0.77
ergy
-0.75
Brun
-0.67
ombs
-0.66
itars
-0.64
inx
-0.64
Zheng
-0.64
aque
-0.63
beit
-0.62
Frog
-0.60
POSITIVE LOGITS
elfare
1.10
welfare
0.96
recipients
0.94
reform
0.86
beneficiaries
0.82
benefit
0.80
Welfare
0.80
entitle
0.79
queens
0.76
fare
0.76
Activations Density 0.022%