INDEX
    Explanations

    phrases related to actions done on someone's behalf or for their benefit

    terms related to power dynamics and authority in relationships

    New Auto-Interp
    Negative Logits
    uesday
    -0.78
    uve
    -0.70
    adena
    -0.69
    ammy
    -0.69
    tein
    -0.66
     Topic
    -0.66
    binary
    -0.65
    ãĤ£
    -0.63
    Bul
    -0.63
    usted
    -0.63
    POSITIVE LOGITS
    steps
    0.98
    stretched
    0.70
    .
    0.68
    liest
    0.68
    books
    0.67
     selves
    0.64
     subordinates
    0.61
     Majesty
    0.60
    iest
    0.60
    lessness
    0.60
    Act Density 0.262%

    No Known Activations