INDEX
    Explanations

    mentions of political front-runners

    repeated references to political candidates or front-runners in elections

    New Auto-Interp
    Negative Logits
    Reward
    -0.61
     Curve
    -0.61
     Definitions
    -0.61
     Kard
    -0.59
     Crime
    -0.57
     Reincarn
    -0.57
     Mean
    -0.56
    FORE
    -0.56
    nia
    -0.55
    ution
    -0.55
    POSITIVE LOGITS
    runners
    1.41
    runner
    1.16
    iers
    1.11
    page
    1.00
    loading
    0.96
    office
    0.95
    ben
    0.95
    bench
    0.94
    liner
    0.94
    liners
    0.91
    Act Density 0.036%

    No Known Activations