INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     better
    -0.66
    better
    -0.66
    Better
    -0.63
     meglio
    -0.61
    RectangleBorder
    -0.57
    SpringBootTest
    -0.56
     mieux
    -0.54
     melhor
    -0.52
     typelib
    -0.52
    -0.52
    POSITIVE LOGITS
     different
    0.81
     Different
    0.75
    different
    0.73
    Different
    0.70
     DIFFERENT
    0.66
     verschied
    0.60
     diferente
    0.59
     différents
    0.59
    epiece
    0.58
    不同的
    0.58
    Act Density 0.003%

    No Known Activations