INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Forgery
    -0.07
     Proc
    -0.07
    Descri
    -0.07
    .animate
    -0.07
     certainty
    -0.07
    /tmp
    -0.06
     cellar
    -0.06
    Alamat
    -0.06
    -described
    -0.06
    included
    -0.06
    POSITIVE LOGITS
    0.07
     CHAR
    0.06
    άκ
    0.06
    TEL
    0.06
    ρω
    0.06
    0.06
    mit
    0.06
    ूर
    0.06
     бать
    0.06
    Comm
    0.06
    Act Density 0.039%

    No Known Activations