INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
    392
    -0.08
     Protestant
    -0.07
    gid
    -0.07
    正确
    -0.07
     Gladi
    -0.07
    -0.07
    onite
    -0.07
     Malware
    -0.07
    -0.07
    POSITIVE LOGITS
     everytime
    0.10
     tránh
    0.09
    Avoid
    0.09
     variability
    0.09
     evitando
    0.09
     synonyms
    0.09
    variation
    0.09
     evitare
    0.08
     избег
    0.08
     decât
    0.08
    Act Density 0.017%

    No Known Activations