INDEX
    Explanations

    expressions of surprise or unexpected outcomes

    New Auto-Interp
    Negative Logits
    cshtml
    -0.54
     reconoci
    -0.54
     vœux
    -0.51
     норма
    -0.49
     vergleichen
    -0.49
    peka
    -0.49
    Blah
    -0.49
     enabling
    -0.49
    少了
    -0.48
    ReadAll
    -0.48
    POSITIVE LOGITS
     surprise
    2.24
     unexpected
    2.11
     surprises
    1.97
     surprising
    1.88
    Surprise
    1.84
     Unexpected
    1.84
    surprise
    1.82
     Surprise
    1.77
    unexpected
    1.75
    Unexpected
    1.72
    Act Density 0.087%

    No Known Activations