INDEX
Negative Logits
illow
-0.10
Dudley
-0.09
preventative
-0.09
Cove
-0.09
ing
-0.09
iel
-0.09
vironment
-0.08
edList
-0.08
Juan
-0.08
networking
-0.08
POSITIVE LOGITS
policy
0.25
policy
0.20
æĶ¿çŃĸ
0.20
policies
0.19
interventions
0.17
Policy
0.17
çŃĸ
0.17
design
0.17
Policy
0.16
intervention
0.16
Activations Density 0.061%