INDEX
    Explanations

    conversational fillers and casual discourse markers

    New Auto-Interp
    Negative Logits
    igy
    -0.19
    itler
    -0.16
    اط
    -0.16
    .tel
    -0.14
    eway
    -0.14
    ama
    -0.14
    foy
    -0.14
    wg
    -0.13
    lena
    -0.13
     Ain
    -0.13
    POSITIVE LOGITS
     er
    0.47
     um
    0.43
    um
    0.34
     err
    0.34
     well
    0.33
     uh
    0.33
     ah
    0.33
     shall
    0.31
     erm
    0.30
    uh
    0.30
    Act Density 0.146%

    No Known Activations