INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Πολ
    -0.07
     upbringing
    -0.06
    της
    -0.06
    -0.06
    nection
    -0.06
    awns
    -0.06
    okt
    -0.06
     Pixar
    -0.06
    mpr
    -0.06
    avn
    -0.06
    POSITIVE LOGITS
     ACK
    0.07
     BER
    0.07
     غذ
    0.07
     společně
    0.07
     Chem
    0.06
     RESULTS
    0.06
     GLUT
    0.06
     scrapped
    0.06
    diag
    0.06
     elegance
    0.06
    Act Density 0.003%

    No Known Activations