INDEX
    Explanations

    phrases indicating responsibility and accountability in a political context

    New Auto-Interp
    Negative Logits
    ãĥ³ãĥĩãĤ£
    -0.16
    ollen
    -0.15
    utherford
    -0.15
    ESSAGES
    -0.15
    ÅĻev
    -0.15
     Branch
    -0.14
    ά
    -0.14
    stÃŃ
    -0.14
    loff
    -0.14
    klä
    -0.14
    POSITIVE LOGITS
    anzi
    0.15
    bakan
    0.15
    arti
    0.15
    PIX
    0.15
    ẹ
    0.15
    Ë
    0.14
    eny
    0.14
    olest
    0.14
     Batter
    0.14
    IFS
    0.14
    Act Density 0.011%

    No Known Activations