INDEX
    Explanations

    phrases related to politics, lobbying, and financial contributions

    New Auto-Interp
    Negative Logits
    udeb
    -0.84
    oths
    -0.77
    mere
    -0.74
    cedented
    -0.74
    famous
    -0.74
    show
    -0.73
    bourg
    -0.73
    warts
    -0.73
    older
    -0.71
    Downloadha
    -0.70
    POSITIVE LOGITS
     minded
    1.05
     nature
    0.95
    ity
    0.93
     environments
    0.83
     approach
    0.81
     solutions
    0.81
     mode
    0.80
    ities
    0.79
     multip
    0.79
     reuse
    0.76
    Act Density 1.364%

    No Known Activations