INDEX
    Explanations

    mentions of authoritative or governmental figures and actions

    discussions around decision-making and accountability in public contexts

    New Auto-Interp
    Negative Logits
    / 
    -0.66
    Newsletter
    -0.63
    taboola
    -0.63
     respectively
    -0.58
     Byz
    -0.56
    ezvous
    -0.56
    igon
    -0.56
     resid
    -0.55
    %);
    -0.55
     Trog
    -0.55
    POSITIVE LOGITS
    ?!"
    0.96
     such
    0.96
    !?"
    0.88
     blatantly
    0.86
     suddenly
    0.85
    ?!
    0.84
     mere
    0.84
    !?
    0.83
     solely
    0.80
     someone
    0.79
    Act Density 0.899%

    No Known Activations