INDEX
    Explanations

    questions and responses that convey uncertainty or requests for clarification

    New Auto-Interp
    Negative Logits
     فريبيس
    -1.05
     estekak
    -0.91
     protoimpl
    -0.78
    RTDA
    -0.77
     estimés
    -0.75
    atrième
    -0.74
    AccessorTable
    -0.74
     Walkover
    -0.73
    ngths
    -0.72
    virons
    -0.72
    POSITIVE LOGITS
    Do
    0.47
    0.46
     venuto
    0.44
    Can
    0.42
    וֹ
    0.42
    Yeah
    0.42
    he
    0.41
    Indeed
    0.41
    0.41
    He
    0.41
    Act Density 0.133%

    No Known Activations