INDEX
    Explanations

    mathematical expressions or notation

    New Auto-Interp
    Negative Logits
    )
    -0.52
    ”)
    -0.46
    ),
    -0.46
     ),
    -0.44
    ”),
    -0.43
     })}
    -0.42
    "),
    -0.41
     )
    -0.40
    ’)
    -0.40
     )}
    -0.39
    POSITIVE LOGITS
    }^{
    1.53
    ^{
    1.17
    |^{
    1.05
    \}^{
    1.03
    )}^{
    1.03
     ^{
    0.94
    ]^{
    0.93
    }}^{
    0.93
     }}^{
    0.92
     $^{
    0.92
    Act Density 0.115%

    No Known Activations