INDEX
    Explanations

    punctuation marks, particularly question marks and periods

    New Auto-Interp
    Negative Logits
    arg
    -0.17
    оваÑĢ
    -0.15
     questions
    -0.15
     Nor
    -0.14
     thất
    -0.14
    argc
    -0.14
    olla
    -0.14
    iao
    -0.14
     sweeping
    -0.13
    il
    -0.13
    POSITIVE LOGITS
    Ans
    0.26
     Ans
    0.24
    ANS
    0.24
    Answer
    0.23
     ans
    0.22
     Answer
    0.21
     answer
    0.20
    answer
    0.20
    _ans
    0.20
    ans
    0.20
    Act Density 0.044%

    No Known Activations