INDEX
    Explanations

    the presence of specific introductory phrases or transitions in the text

    New Auto-Interp
    Negative Logits
     myſelf
    -1.04
    دانشنامهٔ
    -0.97
     becauſe
    -0.92
     houſe
    -0.91
    BibitemShut
    -0.89
     itſelf
    -0.89
     Eſ
    -0.89
     ſeveral
    -0.88
     purpoſe
    -0.88
     Monfieur
    -0.87
    POSITIVE LOGITS
    <eos>
    1.09
    <bos>
    1.02
    </strong>
    0.95
    </b>
    0.92
    </u>
    0.80
    </em>
    0.78
    0.78
    '
    0.74
      
    0.73
    </i>
    0.72
    Act Density 0.015%

    No Known Activations