INDEX
    Explanations

    references to personal pronouns and expressions of intent or desire

    New Auto-Interp
    Negative Logits
     desiring
    -0.73
    Жела
    -0.71
    desired
    -0.69
     wishing
    -0.67
     wished
    -0.63
     Desired
    -0.63
    Wishing
    -0.62
    wish
    -0.61
     Wishing
    -0.60
     desired
    -0.59
    POSITIVE LOGITS
     want
    1.06
     wan
    0.70
     wants
    0.65
     quiero
    0.57
     WAN
    0.56
     voulez
    0.52
     Wan
    0.51
     veut
    0.50
     quieren
    0.49
     Want
    0.49
    Act Density 0.277%

    No Known Activations