INDEX
    Explanations

    conversational markers indicating uncertainty or interactivity in dialogue

    New Auto-Interp
    Negative Logits
    yn
    -0.15
    olet
    -0.15
    pher
    -0.15
    ÌĨ
    -0.14
    gnore
    -0.14
    velt
    -0.14
    sez
    -0.14
    zel
    -0.14
    opolitan
    -0.13
    rrha
    -0.13
    POSITIVE LOGITS
     Briggs
    0.15
    owo
    0.14
    под
    0.13
    tod
    0.13
    ibri
    0.13
    åºı
    0.13
     Downs
    0.13
    asl
    0.13
     Roth
    0.13
    [s
    0.13
    Act Density 0.161%

    No Known Activations