INDEX
    Explanations

    structural words at the beginning of sentences or paragraphs in academic papers

    New Auto-Interp
    Negative Logits
    geries
    -0.07
     Hao
    -0.07
    antar
    -0.07
     ëĭ¤ìļ´ë°Ľê¸°
    -0.06
    eb
    -0.06
    ses
    -0.06
    bable
    -0.06
    sert
    -0.06
    bris
    -0.06
     íĮĮìĿ¼ì²¨ë¶Ģ
    -0.06
    POSITIVE LOGITS
    orem
    0.12
    amp
    0.09
    oretical
    0.07
    iming
    0.06
    igh
    0.06
    odor
    0.06
    Åĵ
    0.06
    ory
    0.06
    notated
    0.06
    ories
    0.06
    Act Density 0.656%

    No Known Activations