INDEX
    Explanations

    information related to political figures and their statements

    New Auto-Interp
    Negative Logits
    selves
    -0.79
     selves
    -0.69
     theirs
    -0.67
     Parenthood
    -0.65
    animate
    -0.65
     decay
    -0.65
    destruct
    -0.63
    Reviewer
    -0.62
     inferior
    -0.61
    Daddy
    -0.61
    POSITIVE LOGITS
     himself
    0.93
     quoted
    0.92
     referring
    0.92
     speaking
    0.84
     interviewed
    0.83
     overseeing
    0.83
     cited
    0.82
     firsthand
    0.82
     recommending
    0.81
     personally
    0.81
    Act Density 0.580%

    No Known Activations