INDEX
    Explanations

    questions that express disbelief or surprise

    New Auto-Interp
    Negative Logits
    ounder
    -0.16
    uries
    -0.15
    ÄĽÅ¾
    -0.15
    jem
    -0.15
    ån
    -0.15
    olec
    -0.15
    awn
    -0.14
    алÑİ
    -0.14
    oor
    -0.14
    è²Į
    -0.14
    POSITIVE LOGITS
     did
    0.19
    .did
    0.18
     planet
    0.18
     do
    0.17
     Next
    0.16
    )did
    0.16
    planet
    0.16
     kind
    0.16
     about
    0.15
     happened
    0.15
    Act Density 0.058%

    No Known Activations