INDEX
    Explanations

    references to figures or illustrations in the text

    New Auto-Interp
    Negative Logits
     Reſ
    -0.54
     juſ
    -0.44
     Inſ
    -0.43
     ſon
    -0.41
     ſta
    -0.39
     Perſ
    -0.39
     ſtand
    -0.39
    HtmlAttribute
    -0.38
     Chriftian
    -0.38
     Diſ
    -0.37
    POSITIVE LOGITS
     Figure
    3.08
     figure
    2.92
    Figure
    2.81
     Fig
    2.56
    figure
    2.48
    Fig
    2.33
     fig
    2.23
     figura
    2.23
     FIGURE
    2.20
     Figures
    2.17
    Act Density 1.971%

    No Known Activations