INDEX
    Explanations

    expressions of surprise or unexpected outcomes

    New Auto-Interp
    Negative Logits
     cheminée
    -0.45
    Cabo
    -0.44
    VYMaps
    -0.44
    writerow
    -0.44
     Manus
    -0.42
     atún
    -0.42
     lå
    -0.42
    audrait
    -0.41
     grasas
    -0.41
    <code>
    -0.41
    POSITIVE LOGITS
     Surprise
    0.80
     surprised
    0.77
     surprise
    0.77
     surpris
    0.72
     Delight
    0.69
    surprise
    0.68
    surprised
    0.68
    Surprise
    0.68
     surprises
    0.65
    Surprised
    0.62
    Act Density 0.408%

    No Known Activations