INDEX
    Explanations

    instances of writing or authorship

    New Auto-Interp
    Negative Logits
    cke
    -0.15
    gger
    -0.15
     Barrett
    -0.14
    xee
    -0.14
    -bound
    -0.14
     ward
    -0.14
    arro
    -0.14
    ajo
    -0.14
     trav
    -0.14
    SCALE
    -0.14
    POSITIVE LOGITS
     Bob
    0.15
    wing
    0.15
     patt
    0.15
    uren
    0.14
    inh
    0.14
    aza
    0.14
    shall
    0.14
    IFS
    0.14
    ins
    0.14
    asic
    0.14
    Act Density 0.019%

    No Known Activations