INDEX
    Explanations

    positive assessments of books and written works

    New Auto-Interp
    Negative Logits
     repr
    -0.15
     happier
    -0.15
    fo
    -0.15
    uring
    -0.15
     glorious
    -0.14
     rebut
    -0.14
    ailable
    -0.13
     succinct
    -0.13
     beloved
    -0.13
     Radi
    -0.13
    POSITIVE LOGITS
     informative
    0.28
    Inform
    0.24
    inform
    0.23
     informat
    0.23
     Inform
    0.23
     informational
    0.22
     enlight
    0.22
    eye
    0.22
     instruct
    0.21
     educational
    0.20
    Act Density 0.192%

    No Known Activations