INDEX
    Explanations

    content related to personal reflection and introspection

    New Auto-Interp
    Negative Logits
     reluct
    -1.39
     emphat
    -1.35
     accla
    -1.27
     shenan
    -1.26
     disagre
    -1.23
     milf
    -1.20
     indestru
    -1.19
     depic
    -1.16
     strick
    -1.15
     maneu
    -1.14
    POSITIVE LOGITS
     thinking
    0.78
     thoughts
    0.73
    thinking
    0.72
     thought
    0.67
    💭
    0.66
     Think
    0.66
    Thinking
    0.65
     think
    0.65
    Think
    0.65
     Thinking
    0.63
    Act Density 0.287%

    No Known Activations