INDEX
    Explanations

    references to specific individuals, particularly those involved in politics or public affairs

    New Auto-Interp
    Negative Logits
    o
    -0.20
    er
    -0.20
    opt
    -0.16
    HandlerContext
    -0.16
     consenting
    -0.16
    ney
    -0.15
    ois
    -0.15
    owi
    -0.15
    ores
    -0.14
    oxy
    -0.14
    POSITIVE LOGITS
    ipeg
    0.23
    sylvania
    0.19
    edy
    0.18
    igans
    0.17
    ovation
    0.17
    nun
    0.17
    igan
    0.16
    ery
    0.16
    lw
    0.16
    elho
    0.16
    Act Density 0.014%

    No Known Activations