INDEX
    Explanations

    themes of self-worth and the impact of love on self-perception

    New Auto-Interp
    Negative Logits
    ivec
    -0.15
    illis
    -0.15
    igin
    -0.15
    éĻ
    -0.14
    URA
    -0.14
    versations
    -0.14
    occo
    -0.14
    ëĿ
    -0.13
    .bias
    -0.13
    swers
    -0.13
    POSITIVE LOGITS
     worth
    0.22
     approval
    0.21
     Approval
    0.20
     Worth
    0.20
    approval
    0.20
    worth
    0.19
     inade
    0.18
     adequ
    0.18
     inferior
    0.18
    Approval
    0.18
    Act Density 0.151%

    No Known Activations