INDEX
    Explanations

    discourse markers and words that indicate reasoning or contrast

    New Auto-Interp
    Negative Logits
    ^(@)
    -0.96
    />";
    -0.96
    )|^{
    -0.90
    ]";
    -0.87
    $")
    -0.84
    %";
    -0.82
    NUMX
    -0.82
    _))
    -0.81
    _
    
    -0.81
    %");
    -0.80
    POSITIVE LOGITS
    .
    1.04
    ,
    0.85
    ?
    0.79
    !
    0.73
    ;
    0.67
    0.60
    0.60
    ..
    0.58
    :
    0.57
    (
    0.54
    Act Density 5.635%

    No Known Activations