INDEX
    Explanations

    not trained on different

    New Auto-Interp
    Negative Logits
    .sav
    -0.11
    xea
    -0.09
     unchanged
    -0.09
     reput
    -0.09
    tsy
    -0.09
     unconventional
    -0.09
    ipa
    -0.09
     alike
    -0.09
    olie
    -0.09
     unusual
    -0.08
    POSITIVE LOGITS
     separate
    0.32
     distinct
    0.32
     independent
    0.29
    çĭ¬ç«ĭ
    0.27
     Separate
    0.26
     riêng
    0.25
    distinct
    0.23
     independ
    0.23
     оÑĤделÑĮ
    0.22
     independently
    0.21
    Act Density 0.127%

    No Known Activations