INDEX
    Explanations

    latex document setup and styling

    New Auto-Interp
    Negative Logits
    "})
    0.56
     ""))
    0.52
    ")))
    0.50
    "});
    0.50
    ())))
    0.46
     Swezey
    0.45
    ")}}
    0.43
    ."))
    0.43
    ())),
    0.43
    ')))
    0.43
    POSITIVE LOGITS
    ]{
    1.10
    ={
    0.78
    =
    0.76
    !]
    0.73
    }]{
    0.72
    ]{\
    0.70
    ]=
    0.70
    )]{
    0.68
    }{
    0.63
    .]:
    0.61
    Act Density 0.003%

    No Known Activations