INDEX
    Explanations

    repeated phrases related to reflection and introspection

    New Auto-Interp
    Negative Logits
    IsContent
    -0.70
     nahilalakip
    -0.69
    extAlignment
    -0.68
    iNdEx
    -0.68
    providedIn
    -0.67
    quiera
    -0.66
     ویکی‌پدیای
    -0.66
    expandindo
    -0.66
     saites
    -0.65
    BufferException
    -0.65
    POSITIVE LOGITS
     thinking
    1.11
     thoughts
    1.04
     think
    1.02
     thought
    1.00
     THINK
    0.99
    Thinking
    0.98
    thought
    0.98
     Thinking
    0.98
     Think
    0.98
     consideration
    0.97
    Act Density 0.142%

    No Known Activations