INDEX
    Explanations

    words related to assigning blame or responsibility

    mentions of accountability or attribution of responsibility

    New Auto-Interp
    Negative Logits
    tein
    -0.82
    tering
    -0.69
    frey
    -0.66
    gran
    -0.64
    improve
    -0.63
    cher
    -0.61
    ylon
    -0.61
    quart
    -0.61
    UGE
    -0.61
    division
    -0.60
    POSITIVE LOGITS
    Ohio
    0.84
     citiz
    0.78
    oka
    0.72
    encies
    0.71
    amaz
    0.71
     solely
    0.65
     explan
    0.65
     adolesc
    0.65
     stewards
    0.65
     undermin
    0.64
    Act Density 0.030%

    No Known Activations