INDEX
    Explanations

    structured numeric elements, likely related to sections or subsections of a document

    New Auto-Interp
    Negative Logits
    </b>
    -0.72
    </i>
    -0.62
    <b>
    -0.60
    <eos>
    -0.60
     </
    -0.60
    <i>
    -0.57
    -0.56
    ></
    -0.56
    -0.55
    </
    -0.53
    POSITIVE LOGITS
     iii
    1.27
    VII
    1.22
     VII
    1.22
     VIII
    1.16
    III
    1.15
    VIII
    1.10
     XII
    1.09
     XIII
    1.09
    XIII
    1.08
     IX
    1.08
    Act Density 0.255%

    No Known Activations