INDEX
    Explanations

    elements and instances of self-reflection and existential questioning

    New Auto-Interp
    Negative Logits
     brid
    -0.81
     increasingly
    -0.78
     raft
    -0.78
     pressing
    -0.78
     revers
    -0.77
     favour
    -0.76
     continuous
    -0.75
     favor
    -0.75
     coral
    -0.74
     cycl
    -0.73
    POSITIVE LOGITS
    And
    1.51
    Advertisements
    1.42
    It
    1.40
    They
    1.40
    Instead
    1.38
    Because
    1.38
    That
    1.37
    However
    1.36
    Anyone
    1.36
    Until
    1.35
    Act Density 0.401%

    No Known Activations