INDEX
    Explanations

    instances where someone is surprised

    New Auto-Interp
    Negative Logits
    alach
    -0.82
    ngth
    -0.76
    href
    -0.74
    obe
    -0.70
    bern
    -0.70
    utf
    -0.70
    ciplinary
    -0.70
    itte
    -0.70
    iffe
    -0.69
    tein
    -0.69
    POSITIVE LOGITS
     enough
    0.77
     how
    0.76
     aback
    0.73
     Squid
    0.70
    ãĤ¦ãĤ¹
    0.69
     Pew
    0.69
    cules
    0.69
     Howell
    0.65
     Robin
    0.64
    090
    0.64
    Act Density 0.036%

    No Known Activations