INDEX
    Explanations

    instances of LaTeX formatting or figures in a document

    New Auto-Interp
    Negative Logits
    å¼¾
    -0.17
    ambre
    -0.16
    itar
    -0.15
    italic
    -0.14
    à¤łà¤¨
    -0.14
    itches
    -0.14
    лÑıн
    -0.14
    amet
    -0.14
    elli
    -0.13
    PRS
    -0.13
    POSITIVE LOGITS
     \
    0.20
    \
    0.19
    	
    0.17
    ovel
    0.16
    ~↵
    0.15
    olvers
    0.15
     Tun
    0.14
    center
    0.14
    576
    0.14
    lever
    0.14
    Act Density 0.017%

    No Known Activations