INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .Database
    -0.08
    ист
    -0.07
    .rem
    -0.06
     собствен
    -0.06
     doctoral
    -0.06
     database
    -0.06
    _dev
    -0.06
     rdr
    -0.06
    lm
    -0.06
     ارتباط
    -0.06
    POSITIVE LOGITS
     stif
    0.06
    ivirus
    0.06
    ningen
    0.06
    blocks
    0.06
    iế
    0.06
     Curriculum
    0.06
    FAILURE
    0.06
    وده
    0.06
     nied
    0.06
    LARI
    0.06
    Act Density 0.010%

    No Known Activations