INDEX
    Explanations

    sentences where people state what they want

    New Auto-Interp
    Negative Logits
    اظ
    -0.49
     sauvages
    -0.49
     فريبيس
    -0.48
    invokeLater
    -0.47
    freopen
    -0.47
    Positive
    -0.47
     nici
    -0.46
     препратки
    -0.46
    rhosis
    -0.46
    LEncoder
    -0.46
    POSITIVE LOGITS
     want
    1.13
     wants
    1.13
    want
    0.97
     wanting
    0.94
     Want
    0.93
     WANT
    0.86
    wants
    0.86
    Want
    0.85
     wanted
    0.83
    wanted
    0.78
    Act Density 0.862%

    No Known Activations