INDEX
    Explanations

    quotation marks and dialogue in the text

    New Auto-Interp
    Negative Logits
    ,
    -0.25
    "
    -0.17
    [
    -0.17
    .
    -0.16
    &nbsp
    -0.16
    :
    -0.15
    "s
    -0.15
    ,**
    -0.15
    "",
    -0.15
    \n
    -0.15
    POSITIVE LOGITS
    and
    0.40
     And
    0.39
    And
    0.39
    but
    0.32
    But
    0.26
     "↵
    0.25
     But
    0.25
     but
    0.25
    Also
    0.25
     and
    0.25
    Act Density 0.044%

    No Known Activations