INDEX
    Explanations

    instances of the "<bos>" token which could denote the beginning of new content or sections in the text

    New Auto-Interp
    Negative Logits
    )");
    
    -0.97
    )";
    
    -0.93
    ']);
    
    -0.88
     }}$}
    -0.88
    "){
    
    -0.88
    ."));
    -0.87
    `,
    
    -0.84
     ')
    
    -0.84
    \"");
    -0.84
    "});
    -0.84
    POSITIVE LOGITS
     #
    2.12
    #
    1.83
     \#
    1.77
    .#
    1.68
    #
    1.63
    \#
    1.60
     (#
    1.57
    :#
    1.52
    )#
    1.43
    ('#
    1.42
    Act Density 0.194%

    No Known Activations