INDEX
    Explanations

    instances of polite requests or inquiries in conversation

    New Auto-Interp
    Negative Logits
    gaard
    -0.17
    å±ħæ°ij
    -0.16
    avar
    -0.15
    oque
    -0.15
    lu
    -0.14
    afa
    -0.14
    ynes
    -0.14
    ubs
    -0.14
    öm
    -0.14
     bureaucr
    -0.14
    POSITIVE LOGITS
    çͳ
    0.16
    YRO
    0.16
    rana
    0.16
    erea
    0.15
    ROTO
    0.14
    rang
    0.14
    aptcha
    0.14
    виÑĩай
    0.13
    agers
    0.13
    AGER
    0.13
    Act Density 0.056%

    No Known Activations