INDEX
    Explanations

    punctuation marks, specifically quotation marks

    New Auto-Interp
    Negative Logits
    -0.17
     (“
    -0.17
     “[
    -0.17
    -0.17
    âĢŀV
    -0.16
    âĢŀM
    -0.16
    пÑĢимеÑĢ
    -0.15
    âĢŀN
    -0.15
    “Oh
    -0.14
    âĢŀJ
    -0.14
    POSITIVE LOGITS
     said
    0.48
    said
    0.35
     says
    0.33
     explained
    0.24
     according
    0.24
    ÂĿ
    0.24
     say
    0.23
    says
    0.23
     Ñģказал
    0.23
     wrote
    0.22
    Act Density 0.081%

    No Known Activations