INDEX
    Explanations

    inquiries and opinions from the audience

    New Auto-Interp
    Negative Logits
    weetalert
    -0.15
    amburger
    -0.15
    ä¹ĥ
    -0.15
    ightly
    -0.14
    weit
    -0.14
    åħ·
    -0.14
    ingo
    -0.14
    à¥įदर
    -0.14
    PushMatrix
    -0.13
    ι
    -0.13
    POSITIVE LOGITS
     think
    0.52
     thinks
    0.48
     Think
    0.47
    Think
    0.45
     thoughts
    0.43
    think
    0.41
     thinking
    0.40
     THINK
    0.40
     thought
    0.39
     Thoughts
    0.38
    Act Density 0.051%

    No Known Activations