INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    、_
    -0.10
     articles
    -0.09
    warts
    -0.09
     pam
    -0.09
     essays
    -0.09
     article
    -0.09
    æĸĩ竳
    -0.09
     Articles
    -0.08
     pamph
    -0.08
    ,、
    -0.08
    POSITIVE LOGITS
     reading
    0.30
     book
    0.29
     Reading
    0.26
    Reading
    0.26
    reading
    0.25
     Book
    0.24
     books
    0.23
     library
    0.23
    Book
    0.22
     Books
    0.21
    Act Density 0.152%

    No Known Activations