INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     vrais
    -0.56
     amitié
    -0.54
     noDo
    -0.54
     république
    -0.50
     étrangères
    -0.49
     setuptools
    -0.48
     cuarzo
    -0.48
     subsequence
    -0.48
     démocratie
    -0.48
    Gön
    -0.47
    POSITIVE LOGITS
    énario
    0.63
    awtextra
    0.59
    Diweddarwch
    0.59
    AsUp
    0.57
     otomatig
    0.56
    +#+#
    0.56
    hoeddwyd
    0.53
     hire
    0.51
     mould
    0.49
     بيها
    0.47
    Act Density 0.003%

    No Known Activations