INDEX
    Explanations

    instances of the word "surprise" and its variations

    New Auto-Interp
    Negative Logits
    osy
    -0.19
    ezi
    -0.18
    s
    -0.17
    emann
    -0.17
    embro
    -0.16
    ele
    -0.16
    ello
    -0.16
    ez
    -0.16
    sms
    -0.15
    oen
    -0.15
    POSITIVE LOGITS
    veys
    0.39
    prisingly
    0.38
    prising
    0.37
    rounded
    0.36
    prises
    0.35
    rogate
    0.34
    prise
    0.34
    geries
    0.31
    veillance
    0.31
    faces
    0.31
    Act Density 0.009%

    No Known Activations