INDEX
    Explanations

    words related to policies and their impact

    phrases detailing detrimental policies and their impacts

    New Auto-Interp
    Negative Logits
    odiac
    -0.75
    rete
    -0.71
    raq
    -0.70
    Sync
    -0.69
     Beaver
    -0.69
    reciation
    -0.68
    Pixel
    -0.65
    Registered
    -0.62
    atha
    -0.62
     Feather
    -0.61
    POSITIVE LOGITS
     harms
    1.53
     exacerbate
    1.51
     undermine
    1.49
     undermines
    1.48
     exacerb
    1.45
     impover
    1.41
     devast
    1.36
     undermined
    1.35
     jeopard
    1.35
     adversely
    1.33
    Act Density 0.467%

    No Known Activations