INDEX
    Explanations

    inquiries about self-reflection and personal opinions

    New Auto-Interp
    Negative Logits
    inton
    -0.16
    Å¡tÄĽ
    -0.15
    499
    -0.15
    ":"'
    -0.15
    iliz
    -0.14
     Tire
    -0.14
    ÑĥÑĢи
    -0.14
    allel
    -0.14
    adesh
    -0.13
     BACKGROUND
    -0.13
    POSITIVE LOGITS
     think
    1.02
     Think
    0.94
     thinking
    0.91
    think
    0.91
     THINK
    0.89
     thinks
    0.89
    Think
    0.88
     thoughts
    0.82
     thought
    0.79
    thinking
    0.76
    Act Density 0.449%

    No Known Activations