INDEX
    Explanations

    most important or crucial

    New Auto-Interp
    Negative Logits
    of
    1.11
    building
    1.09
    tion
    1.07
    tional
    1.07
    lines
    1.06
    lude
    1.04
    top
    1.03
    or
    1.02
    life
    1.02
    tions
    1.01
    POSITIVE LOGITS
     voila
    1.06
     whatnot
    1.02
     um
    1.01
     thinks
    0.94
     constantly
    0.94
     excite
    0.93
     learns
    0.93
     encodes
    0.93
     messed
    0.90
     dans
    0.89
    Act Density 0.065%

    No Known Activations