INDEX
    Explanations

    percentage symbols and related formatting in code

    New Auto-Interp
    Negative Logits
    -0.40
     Hallen
    -0.40
     forward
    -0.38
    ta
    -0.37
    -0.36
    confirm
    -0.35
     dabei
    -0.35
     Kirkland
    -0.34
    ,
    -0.34
    forward
    -0.34
    POSITIVE LOGITS
     <?
    0.97
    "><?
    0.96
     "<?
    0.94
     '<?
    0.91
    <?
    0.91
    :%
    0.88
     ${\
    0.87
    ${\
    0.86
    ><?
    0.86
     (%
    0.85
    Act Density 0.208%

    No Known Activations