INDEX
    Explanations

    captions and figure labels within a document

    New Auto-Interp
    Negative Logits
     patch
    -0.16
     frank
    -0.15
    oris
    -0.15
    patch
    -0.15
    usi
    -0.14
     neck
    -0.14
    ales
    -0.14
     Patch
    -0.14
    asi
    -0.14
    vez
    -0.14
    POSITIVE LOGITS
    arella
    0.17
    oulos
    0.17
    .ArgumentParser
    0.15
    iola
    0.14
    itol
    0.14
    opoulos
    0.14
    reed
    0.14
    .tbl
    0.14
    abis
    0.14
    allery
    0.14
    Act Density 0.011%

    No Known Activations