INDEX
    Explanations

    instances of conversational markers and interjections indicating agreement or confirmation

    New Auto-Interp
    Negative Logits
    ">//
    -0.16
    icare
    -0.15
    ÏĢει
    -0.14
     trainable
    -0.14
    ellido
    -0.14
    naz
    -0.14
    wcs
    -0.13
    NavController
    -0.13
    incess
    -0.13
     æľ¨
    -0.13
    POSITIVE LOGITS
    dere
    0.16
     derec
    0.16
    аÑĢамеÑĤ
    0.15
    mos
    0.15
    askell
    0.15
    udur
    0.14
    .hh
    0.14
    sar
    0.13
     meaning
    0.13
    ance
    0.13
    Act Density 0.130%

    No Known Activations