INDEX
    Explanations

    sentences that convey conclusions or final thoughts

    New Auto-Interp
    Negative Logits
     helper
    -0.76
     salute
    -0.71
     panc
    -0.70
     tyr
    -0.69
     imperson
    -0.66
     pillar
    -0.64
    hemer
    -0.64
     utter
    -0.64
     affili
    -0.63
     alphabet
    -0.62
    POSITIVE LOGITS
     Nonetheless
    1.29
     Regardless
    1.28
     Nevertheless
    1.26
     Instead
    1.25
     Alternatively
    1.23
     Depending
    1.22
     Ultimately
    1.20
     Fortunately
    1.20
     Flavoring
    1.19
     Perhaps
    1.17
    Act Density 0.966%

    No Known Activations