INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    иб
    -0.07
     marrying
    -0.06
     Німеч
    -0.06
    -expression
    -0.06
    inish
    -0.06
    前的
    -0.06
    yen
    -0.06
    едж
    -0.06
     material
    -0.06
     analyzes
    -0.06
    POSITIVE LOGITS
    .pages
    0.07
    .func
    0.06
    يز
    0.06
     phục
    0.06
    τέ
    0.06
    |min
    0.06
    pending
    0.06
    основ
    0.06
     +++
    0.06
    _rng
    0.06
    Act Density 0.049%

    No Known Activations