INDEX
    Explanations

    instances of surprise and conversational exchanges

    New Auto-Interp
    Negative Logits
    affen
    -0.17
    ipop
    -0.15
    illard
    -0.15
    agon
    -0.15
    imi
    -0.15
    urum
    -0.15
    ÄĽt
    -0.14
    عÙĦÙĪÙħات
    -0.14
    usters
    -0.14
    .removeFrom
    -0.14
    POSITIVE LOGITS
     notice
    0.42
     see
    0.39
     notices
    0.38
     sees
    0.38
     noticed
    0.35
     noticing
    0.34
    notice
    0.32
     seeing
    0.32
     Notice
    0.31
    see
    0.31
    Act Density 0.139%

    No Known Activations