INDEX
    Explanations

    LaTeX formatting and figure references in scientific documents

    New Auto-Interp
    Negative Logits
     mur
    -0.14
     Mur
    -0.14
    309
    -0.14
    SX
    -0.13
    ews
    -0.13
    æ°
    -0.13
     иÑģ
    -0.13
     Yani
    -0.13
     poc
    -0.13
    sel
    -0.13
    POSITIVE LOGITS
    caption
    0.27
     caption
    0.24
     Caption
    0.24
    figcaption
    0.24
    Caption
    0.22
    -caption
    0.20
    .caption
    0.18
    zoom
    0.17
     captions
    0.17
    ouz
    0.16
    Act Density 0.015%

    No Known Activations