INDEX
Explanations
phrases related to politics, lobbying, and financial contributions
New Auto-Interp
Negative Logits
udeb
-0.84
oths
-0.77
mere
-0.74
cedented
-0.74
famous
-0.74
show
-0.73
bourg
-0.73
warts
-0.73
older
-0.71
Downloadha
-0.70
POSITIVE LOGITS
minded
1.05
nature
0.95
ity
0.93
environments
0.83
approach
0.81
solutions
0.81
mode
0.80
ities
0.79
multip
0.79
reuse
0.76
Activations Density 1.364%