INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     تضيفلها
    -0.69
    oneofs
    -0.66
     moindre
    -0.59
    -0.54
     viendra
    -0.53
     veille
    -0.52
     consultato
    -0.51
     cerchi
    -0.50
     pescoço
    -0.50
     poussière
    -0.50
    POSITIVE LOGITS
     device
    0.91
     project
    0.84
     event
    0.83
     item
    0.81
     tool
    0.81
     program
    0.77
     platform
    0.74
     experiment
    0.74
     component
    0.73
     loophole
    0.73
    Act Density 0.002%

    No Known Activations