INDEX
    Explanations

    lines or sections of code related to documentation or comments

    New Auto-Interp
    Negative Logits
     I
    -0.49
    I
    -0.47
    -0.47
    ...
    -0.43
        
    -0.42
    ↵↵
    -0.42
    -0.42
     [
    -0.42
    [
    -0.42
    if
    -0.41
    POSITIVE LOGITS
     queſta
    1.23
    <unused74>
    1.18
    <unused51>
    1.18
    <unused68>
    1.18
    <unused14>
    1.17
    <unused8>
    1.17
    <unused16>
    1.17
    <unused23>
    1.17
    <pad>
    1.17
    [@BOS@]
    1.17
    Act Density 0.007%

    No Known Activations