INDEX
    Explanations

    mathematical expressions and formal structures in the text

    New Auto-Interp
    Negative Logits
     $
    -0.52
    $
    -0.50
    douard
    -0.47
    web
    -0.45
     gas
    -0.45
     ball
    -0.44
     Web
    -0.43
     web
    -0.43
     male
    -0.43
    )$
    -0.43
    POSITIVE LOGITS
     $\
    1.16
    $\
    0.87
    }{$\
    0.85
    exitRule
    0.82
    adaptiveStyles
    0.78
    ~$\
    0.76
    SourceChecksum
    0.73
    ]='\
    0.73
     saites
    0.69
    ГЛА
    0.68
    Act Density 2.050%

    No Known Activations