INDEX
    Explanations

    technical error messages or code snippets

    New Auto-Interp
    Negative Logits
    eneg
    -0.64
    eatures
    -0.61
     KL
    -0.61
    afety
    -0.61
    anchester
    -0.59
    everal
    -0.59
    cffff
    -0.58
    okin
    -0.58
     Bare
    -0.58
    leneck
    -0.57
    POSITIVE LOGITS
    ))))
    1.11
    "}
    1.10
     attRot
    0.97
    }}
    0.97
    ;;;;;;;;;;;;
    0.95
    };
    0.91
    """
    0.91
    )))
    0.90
    .''.
    0.86
    }.
    0.86
    Act Density 0.114%

    No Known Activations