INDEX
    Explanations

    the presence of document structure elements, particularly markers indicating the beginning of sections or other important formatting features

    New Auto-Interp
    Negative Logits
     Roskov
    -0.97
     purpoſe
    -0.97
     preſent
    -0.95
     cauſe
    -0.93
     houſe
    -0.93
     juſ
    -0.93
     uſe
    -0.92
     Reſ
    -0.92
     ſy
    -0.92
    twimg
    -0.90
    POSITIVE LOGITS
    </em>
    1.13
    </i>
    1.02
    </strong>
    0.98
    </b>
    0.87
    </sup>
    0.86
    </sub>
    0.83
    </u>
    0.77
     }}$
    0.76
    ,
    0.73
    </s>
    0.72
    Act Density 0.055%

    No Known Activations