INDEX
    Explanations

    words related to power, control, authority, and influence

    terms related to power and control dynamics

    New Auto-Interp
    Negative Logits
    iate
    -0.70
    ODE
    -0.69
    individual
    -0.68
    ãĥ£
    -0.68
    rine
    -0.67
    ECD
    -0.66
    rum
    -0.66
     Scientist
    -0.66
    ribe
    -0.65
    aired
    -0.65
    POSITIVE LOGITS
    xual
    0.86
    anship
    0.84
     stemming
    0.81
     reversal
    0.77
     compulsion
    0.77
     prejudice
    0.76
     escalation
    0.74
    uality
    0.73
     avoidance
    0.73
    eering
    0.71
    Act Density 0.078%

    No Known Activations