INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ertian
    0.38
     SLASH
    0.37
    Slash
    0.37
    Arquivo
    0.36
    த்திரை
    0.35
    slash
    0.34
    0.34
    FAO
    0.34
    ͝
    0.34
     rdf
    0.33
    POSITIVE LOGITS
     moral
    1.93
     Moral
    1.83
    Moral
    1.82
    moral
    1.75
     мора
    1.64
     morally
    1.38
    道德
    1.38
     नैतिक
    1.18
     morals
    1.11
     নৈতিক
    1.10
    Act Density 0.034%

    No Known Activations