INDEX
Explanations
references to institutions related to research and policy, particularly think tanks
references to think tanks and related organizations
New Auto-Interp
Negative Logits
alogue
-0.74
amaz
-0.74
pain
-0.70
satisf
-0.66
Complete
-0.65
illard
-0.65
uder
-0.65
Butcher
-0.64
handc
-0.64
goose
-0.64
POSITIVE LOGITS
arians
0.99
arian
0.84
instit
0.82
Institution
0.82
chaired
0.80
institute
0.80
convened
0.78
uments
0.77
adviser
0.75
economists
0.75
Activations Density 0.101%