INDEX
    Explanations

    expressions of hopefulness and engagement with the reader

    New Auto-Interp
    Negative Logits
     ought
    -0.16
    should
    -0.16
     shouldBe
    -0.15
     SHOULD
    -0.15
     skulle
    -0.14
    .should
    -0.14
    ÏĢι
    -0.14
    apper
    -0.14
     trebuie
    -0.14
    Should
    -0.13
    POSITIVE LOGITS
     enjoyed
    0.22
     guys
    0.22
     enjoy
    0.20
     enjoys
    0.19
    Enjoy
    0.18
     Enjoy
    0.18
     Guys
    0.18
     enjoying
    0.17
     agree
    0.17
     found
    0.15
    Act Density 0.048%

    No Known Activations