INDEX
    Explanations

    phrases related to politics, power dynamics, and societal issues

    New Auto-Interp
    Negative Logits
    ichen
    -0.76
    quire
    -0.68
    queue
    -0.68
    adish
    -0.68
    ades
    -0.67
    uled
    -0.65
    lette
    -0.65
     consulted
    -0.63
    mentioned
    -0.62
     reimb
    -0.62
    POSITIVE LOGITS
     seriousness
    1.20
     superiority
    1.03
     greatness
    1.00
     absurdity
    0.99
     resilience
    0.99
     individuality
    0.98
     masculinity
    0.98
     sincerity
    0.98
     versatility
    0.98
     willingness
    0.98
    Act Density 1.731%

    No Known Activations