INDEX
    Explanations

    expressions of surprise or disbelief

    New Auto-Interp
    Negative Logits
    aurus
    -0.16
    ption
    -0.15
    apon
    -0.15
    stown
    -0.14
     ug
    -0.14
    encer
    -0.14
    ега
    -0.14
    atis
    -0.14
    <IM
    -0.14
    essim
    -0.13
    POSITIVE LOGITS
    Oh
    0.17
     yes
    0.16
    gross
    0.15
    338
    0.15
    dre
    0.14
     Spear
    0.14
     Hutch
    0.14
    èª
    0.14
     gross
    0.14
     Gross
    0.14
    Act Density 0.028%

    No Known Activations