INDEX
    Explanations

    affirmative responses and expressions of agreement

    New Auto-Interp
    Negative Logits
    för
    -0.38
    Towns
    -0.35
    Αν
    -0.34
    -0.34
     gider
    -0.34
    ARROLL
    -0.34
    帖最后由
    -0.34
    んですけど
    -0.33
    Token
    -0.33
    currentColor
    -0.33
    POSITIVE LOGITS
     yes
    1.06
     Yes
    0.95
    Yes
    0.94
    yes
    0.90
     YES
    0.82
     yep
    0.77
    YES
    0.76
     Yep
    0.73
    Yep
    0.68
    yep
    0.68
    Act Density 0.214%

    No Known Activations