INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (ob
    -0.06
    ाल
    -0.06
    .Visible
    -0.06
    ство
    -0.06
    atta
    -0.06
    zier
    -0.06
    ibraltar
    -0.06
     fleeing
    -0.06
    dam
    -0.06
     Celt
    -0.06
    POSITIVE LOGITS
    вещ
    0.07
    .until
    0.07
    .words
    0.07
     завдання
    0.07
     Tonight
    0.07
    0.06
     paragraphs
    0.06
     Specification
    0.06
    보내기
    0.06
    -special
    0.06
    Act Density 0.017%

    No Known Activations