INDEX
Explanations
policy-related terms
terms related to public policy discussions
New Auto-Interp
Negative Logits
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
-0.76
upon
-0.73
ITNESS
-0.70
Gaw
-0.70
Stain
-0.69
FORMATION
-0.68
Templ
-0.68
Ago
-0.67
wolves
-0.67
eneg
-0.67
POSITIVE LOGITS
policy
1.03
makers
0.97
prescriptions
0.96
policy
0.94
making
0.93
interventions
0.93
policies
0.92
stance
0.92
policymakers
0.92
advisers
0.91
Activations Density 0.029%