INDEX
    Explanations

    references to authors and book titles in literary works

    New Auto-Interp
    Negative Logits
     minus
    -0.15
    eding
    -0.14
    è¡£
    -0.14
     exact
    -0.14
    SP
    -0.14
     away
    -0.14
    éŁ¿
    -0.14
     ras
    -0.14
    HN
    -0.13
    iment
    -0.13
    POSITIVE LOGITS
    rokes
    0.17
    icolor
    0.16
    otos
    0.16
    oksen
    0.15
    bilt
    0.15
    æ³£
    0.15
    imonial
    0.15
    ascus
    0.15
    inator
    0.15
    agas
    0.15
    Act Density 0.162%

    No Known Activations