INDEX
Explanations
mentions of transparency and accountability in various contexts
New Auto-Interp
Negative Logits
Interstitial
-0.71
erville
-0.69
anova
-0.67
awar
-0.65
Stars
-0.64
enegger
-0.62
AMI
-0.60
ucky
-0.60
################
-0.60
aceous
-0.60
POSITIVE LOGITS
recy
0.98
parency
0.96
transparency
0.83
disclosures
0.83
disclosure
0.82
rity
0.80
ibility
0.75
DeVos
0.75
Disclosure
0.74
transparent
0.73
Activations Density 0.035%