INDEX
    Explanations

    references to various "series" in different contexts

    New Auto-Interp
    Negative Logits
    -0.40
    de
    -0.39
    .
    -0.35
    1
    -0.35
    ↵↵
    -0.35
     (
    -0.35
     {
    -0.33
    -0.33
    <i>
    -0.33
    import
    -0.33
    POSITIVE LOGITS
    <unused14>
    0.98
    <unused79>
    0.98
    <unused52>
    0.98
    <unused3>
    0.97
    <unused41>
    0.97
    [@BOS@]
    0.97
    <unused8>
    0.97
    <unused16>
    0.97
    <unused42>
    0.97
    <unused68>
    0.97
    Act Density 0.788%

    No Known Activations