INDEX
    Explanations

    affirmative responses and degrees of confidence in dialogue

    New Auto-Interp
    Negative Logits
     transfieras
    -0.44
    だよね
    -0.40
     Yeah
    -0.37
     kid
    -0.37
     freakin
    -0.36
    みんなの
    -0.36
    👭
    -0.35
    んだよね
    -0.35
     Gotta
    -0.34
     đứa
    -0.34
    POSITIVE LOGITS
     sir
    2.16
    Sir
    1.98
     Sir
    1.97
    sir
    1.70
     SIR
    1.55
    SIR
    1.51
     Sirs
    1.34
     madam
    1.30
     Madam
    1.21
     senhor
    1.16
    Act Density 0.318%

    No Known Activations