INDEX
    Explanations

    the word "surprise" at various levels of activation

    expressions of surprise

    New Auto-Interp
    Negative Logits
    nan
    -0.83
    folios
    -0.81
    ©¶æ
    -0.81
    oreal
    -0.81
    odynam
    -0.80
    agra
    -0.80
    tein
    -0.78
    İĭ
    -0.76
    asus
    -0.74
    uel
    -0.73
    POSITIVE LOGITS
     surprise
    0.81
     absor
    0.79
     surprises
    0.79
     guests
    0.79
     Surprise
    0.76
    ingly
    0.76
     Flavoring
    0.75
     visitor
    0.74
     Squid
    0.74
    ãģį
    0.71
    Act Density 0.047%

    No Known Activations