INDEX
    Explanations

    references to figures, illustrations, and supporting data in a document

    New Auto-Interp
    Negative Logits
    ocaly
    -0.07
     пеÑĢеб
    -0.06
    .li
    -0.06
    zin
    -0.06
    μεν
    -0.06
    zew
    -0.06
    gow
    -0.06
    dorf
    -0.06
    zel
    -0.06
    -mf
    -0.06
    POSITIVE LOGITS
     figure
    0.12
     legends
    0.11
     figures
    0.11
     Legends
    0.11
     legend
    0.11
    -figure
    0.10
     tables
    0.10
     caption
    0.09
     Figure
    0.09
     figura
    0.08
    Act Density 0.011%

    No Known Activations