INDEX
    Explanations

    out words related to researching or exploring a topic in depth

    New Auto-Interp
    Negative Logits
     emphat
    -1.25
     fte
    -1.10
     effe
    -1.08
     wien
    -1.08
     fta
    -1.06
     intermitt
    -1.04
     affor
    -1.03
     reluct
    -1.03
     perfet
    -1.03
     accla
    -1.01
    POSITIVE LOGITS
     learn
    0.85
    <bos>
    0.77
    learn
    0.75
    Learn
    0.70
     Learn
    0.68
     learns
    0.68
     learned
    0.66
     how
    0.66
     discover
    0.66
     out
    0.66
    Act Density 0.055%

    No Known Activations