INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ليه
    0.94
    ă
    0.91
     painkillers
    0.90
    ת
    0.90
    0.90
    endenti
    0.89
     squadre
    0.89
    ingresso
    0.88
     eléctrico
    0.87
    دة
    0.86
    POSITIVE LOGITS
     Entre
    0.78
     head
    0.76
     Press
    0.74
     la
    0.73
     product
    0.73
     Story
    0.73
    iz
    0.71
    y
    0.71
     pi
    0.70
     unfolding
    0.70
    Act Density 0.001%

    No Known Activations