INDEX
Explanations
policy-related terms and political actions
references to social and economic policies affecting marginalized groups
New Auto-Interp
Negative Logits
vant
-0.64
lor
-0.62
agonist
-0.61
Intent
-0.60
itely
-0.58
herein
-0.58
vous
-0.57
hid
-0.57
Feature
-0.57
fan
-0.57
POSITIVE LOGITS
mosques
1.06
prisons
1.00
universities
0.92
Guantanamo
0.91
incarceration
0.91
hospitals
0.90
pensions
0.89
contraceptives
0.89
incarcer
0.89
TPP
0.89
Activations Density 0.588%