INDEX
    Explanations

    the presence of document structure or formatting indicators

    New Auto-Interp
    Negative Logits
    ,
    -0.79
    _
    -0.78
    -
    -0.77
    /
    -0.63
    (
    -0.59
    ma
    -0.58
    us
    -0.58
    .
    -0.58
    ...
    -0.56
    ;
    -0.56
    POSITIVE LOGITS
    "):
    
    1.29
    )";
    
    1.19
     ')
    
    1.18
    )");
    
    1.18
    '))
    
    1.17
    "])
    
    1.17
    "]);
    
    1.15
    ")));
    
    1.15
    ']))
    
    1.11
     The
    1.09
    Act Density 0.317%

    No Known Activations