INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     Zhou
    -0.07
    zem
    -0.07
     Rule
    -0.07
     châu
    -0.06
    -0.06
     أكد
    -0.06
    quartered
    -0.06
     Tet
    -0.06
     Author
    -0.06
    POSITIVE LOGITS
     dinero
    0.07
    ն
    0.07
    oma
    0.07
    רוצים
    0.07
     שת
    0.07
     LogManager
    0.07
    还能
    0.07
    alt
    0.07
     groß
    0.07
    ~~~~~~~~~~~~~~~~
    0.06
    Act Density 0.026%

    No Known Activations