INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    0.84
    y
    0.82
    sax
    0.81
    0.79
    oretically
    0.76
     quoi
    0.74
    י
    0.73
    0.73
    ি
    0.72
    yers
    0.70
    POSITIVE LOGITS
     эту
    0.89
     tejto
    0.88
    0.80
     بر
    0.79
     sofern
    0.77
    こちらは
    0.77
     würde
    0.75
    ٿ
    0.75
     ئەم
    0.75
     Tämä
    0.75
    Act Density 0.038%

    No Known Activations