INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     dialog
    -0.08
     DAL
    -0.08
     було
    -0.08
     merged
    -0.08
    .dialog
    -0.07
     вп
    -0.07
    Dialog
    -0.07
     revoir
    -0.07
    -quality
    -0.07
    Merged
    -0.07
    POSITIVE LOGITS
    э
    0.08
    roud
    0.08
     losing
    0.07
     regimes
    0.07
    ufu
    0.07
     Britney
    0.07
    ಗಾಗಿ
    0.07
    роф
    0.07
     nets
    0.07
     ব্র
    0.07
    Act Density 0.004%

    No Known Activations