INDEX
    Explanations

    instances of questions and affirmative responses in dialogue

    New Auto-Interp
    Negative Logits
    .Apis
    -0.17
    daq
    -0.15
    ÏĨι
    -0.14
    andom
    -0.14
    asl
    -0.14
    ̣
    -0.14
    리ì§Ģ
    -0.14
    .cloudflare
    -0.13
    è§
    -0.13
     Birds
    -0.13
    POSITIVE LOGITS
     yes
    0.25
    yes
    0.22
     Yes
    0.22
     res
    0.22
     YES
    0.21
    0.19
     Maybe
    0.19
     Hell
    0.19
     NO
    0.19
    Yes
    0.19
    Act Density 0.076%

    No Known Activations