INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pleaſure
    -0.65
    genodigd
    -0.64
     Infór
    -0.61
     PagesJaunes
    -0.58
     plufieurs
    -0.57
     saveiro
    -0.57
     queſta
    -0.57
    ſelves
    -0.57
     témoig
    -0.56
     Polsek
    -0.56
    POSITIVE LOGITS
    
    1.16
    most
    0.35
    期刊论文
    0.35
     dus
    0.34
    
    0.33
    capitalize
    0.32
     ec
    0.31
     ke
    0.31
    Ec
    0.31
    UnusedPrivate
    0.31
    Act Density 0.000%

    No Known Activations