INDEX
    Explanations

    phrases indicating a historical or factual context

    New Auto-Interp
    Negative Logits
    politics
    -0.70
    fo
    -0.70
    rene
    -0.65
    trap
    -0.65
    marg
    -0.65
    ben
    -0.64
    get
    -0.63
     Panda
    -0.63
    aiden
    -0.62
    pez
    -0.62
    POSITIVE LOGITS
    soever
    1.27
     we
    0.70
     they
    0.69
    xual
    0.68
     he
    0.67
    izens
    0.66
    =-=-=-=-=-=-=-=-
    0.65
    ippi
    0.64
     she
    0.64
    ordan
    0.63
    Act Density 0.386%

    No Known Activations