INDEX
    Explanations

    phrases related to self-talk and internal dialogue

    concepts related to self-reflection and personal actions

    New Auto-Interp
    Negative Logits
     Reconstruction
    -0.64
    ighed
    -0.57
     purported
    -0.56
    gerald
    -0.55
     supplemented
    -0.55
     Annex
    -0.54
     instituted
    -0.51
    Ĭ±
    -0.50
    Il
    -0.50
     1967
    -0.49
    POSITIVE LOGITS
     yourself
    1.50
     yourselves
    1.27
     Yourself
    1.09
     your
    0.96
     YOUR
    0.85
    your
    0.78
    Your
    0.76
     Your
    0.72
    poke
    0.64
     yours
    0.63
    Act Density 0.743%

    No Known Activations