INDEX
    Explanations

    references to individual actions or identities in a narrative context

    New Auto-Interp
    Negative Logits
    utto
    -0.17
    uzzi
    -0.17
    eyer
    -0.16
    ogan
    -0.16
    udo
    -0.15
    velt
    -0.15
    .aw
    -0.15
    rown
    -0.15
    oven
    -0.14
    agne
    -0.14
    POSITIVE LOGITS
     flat
    0.18
    oram
    0.17
     wid
    0.16
    flat
    0.15
     Wid
    0.15
    bp
    0.15
    wid
    0.14
     Burnett
    0.14
    788
    0.14
     cle
    0.14
    Act Density 0.018%

    No Known Activations