INDEX
    Explanations

    instances of reflection and contemplation about names or concepts

    New Auto-Interp
    Negative Logits
    337
    -0.17
    oli
    -0.16
    aliz
    -0.16
    oly
    -0.16
    elage
    -0.15
     GRAPH
    -0.15
    igans
    -0.15
    thr
    -0.15
    thesis
    -0.14
    hf
    -0.14
    POSITIVE LOGITS
     think
    0.53
     Think
    0.49
    think
    0.47
    Think
    0.45
     thinking
    0.43
     thinks
    0.42
     thought
    0.42
     THINK
    0.37
    thinking
    0.36
     Thinking
    0.36
    Act Density 0.044%

    No Known Activations