INDEX
    Explanations

    rhetorical questions and conversational language

    New Auto-Interp
    Negative Logits
    uce
    -0.15
    agen
    -0.14
    زÛĮ
    -0.14
    culus
    -0.14
    ãĥĨãĥ«
    -0.14
     merit
    -0.14
    obook
    -0.14
     imm
    -0.14
    erg
    -0.14
    loy
    -0.13
    POSITIVE LOGITS
     yeah
    0.18
     WELL
    0.17
     Well
    0.15
     Yeah
    0.15
    /tos
    0.15
     chances
    0.15
     bien
    0.14
    ãĥ©ãĤ¹
    0.14
    shima
    0.14
     well
    0.14
    Act Density 0.083%

    No Known Activations