INDEX
    Explanations

    questions or statements starting with the word "Asked"

    instances of questions being asked or inquiries being made

    New Auto-Interp
    Negative Logits
    Fit
    -0.80
    ECD
    -0.68
    MpServer
    -0.67
    jam
    -0.65
    AMY
    -0.65
    Cod
    -0.63
    equal
    -0.61
    agement
    -0.60
    align
    -0.59
    EStreamFrame
    -0.58
    POSITIVE LOGITS
     questions
    1.21
     rhet
    1.08
     Questions
    1.02
     whether
    1.01
     why
    0.95
     quizz
    0.94
     question
    0.92
     probing
    0.84
     sarcast
    0.84
     afterwards
    0.82
    Act Density 0.032%

    No Known Activations