INDEX
    Explanations

    instances of the word "surprise."

    instances of the word "surprise."

    New Auto-Interp
    Negative Logits
    ©¶æ
    -1.06
    İĭ
    -0.92
    nan
    -0.92
    ssl
    -0.83
    oreal
    -0.83
    bis
    -0.82
    asus
    -0.81
    everal
    -0.80
    oran
    -0.76
    arre
    -0.76
    POSITIVE LOGITS
     surprise
    0.92
     surprises
    0.90
     Surprise
    0.88
    ingly
    0.87
    ously
    0.78
     Flavoring
    0.73
     absor
    0.71
    Berry
    0.71
     onlook
    0.69
     Squid
    0.68
    Act Density 0.027%

    No Known Activations