INDEX
    Explanations

    specific nouns related to literature and authors

    New Auto-Interp
    Negative Logits
    ovich
    -0.19
     zv
    -0.17
    aver
    -0.17
     kv
    -0.16
     Shay
    -0.16
    ayd
    -0.16
    AVA
    -0.16
    pv
    -0.15
     Royale
    -0.15
    ayer
    -0.15
    POSITIVE LOGITS
     Jew
    0.23
     Paw
    0.21
    jaw
    0.21
     Sew
    0.20
     Lew
    0.20
    wj
    0.20
     Wor
    0.20
    elow
    0.19
     pj
    0.19
    ds
    0.19
    Act Density 0.020%

    No Known Activations