INDEX
    Explanations

    references to historical or mythological figures and events

    New Auto-Interp
    Negative Logits
    izu
    -0.17
    abei
    -0.15
    lah
    -0.14
    енз
    -0.14
     haf
    -0.14
    datum
    -0.14
    atif
    -0.14
    ált
    -0.14
    diag
    -0.14
     Domino
    -0.14
    POSITIVE LOGITS
     Pand
    0.22
     Vy
    0.19
     Hast
    0.19
     sage
    0.18
     Vide
    0.17
     Utt
    0.17
     Dw
    0.16
     Vir
    0.16
    Vir
    0.16
     Dra
    0.16
    Act Density 0.059%

    No Known Activations