INDEX
    Explanations

    references to literary works and their authors

    New Auto-Interp
    Negative Logits
    affe
    -0.15
    itchen
    -0.14
     İli
    -0.14
    pch
    -0.14
    ocz
    -0.14
     vite
    -0.14
     поÑĤ
    -0.13
    uild
    -0.13
     bergen
    -0.13
    hiro
    -0.13
    POSITIVE LOGITS
     bear
    0.25
     Band
    0.23
    band
    0.23
    Band
    0.23
     band
    0.21
     Bear
    0.20
     Piper
    0.20
    bear
    0.20
     Tas
    0.20
     Vand
    0.20
    Act Density 0.069%

    No Known Activations