INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ی
    0.32
     quando
    0.31
     cuando
    0.31
     hatta
    0.29
    ن
    0.29
    ",
    0.28
     ,
    0.28
     vacanam
    0.28
    ش
    0.28
     pusieron
    0.28
    POSITIVE LOGITS
     being
    0.40
     acknowledging
    0.38
     also
    0.38
     it
    0.35
     maintaining
    0.34
    being
    0.34
     there
    0.34
     simultaneously
    0.33
     ايضا
    0.33
    0.33
    Act Density 0.013%

    No Known Activations