INDEX
    Explanations

    contractions and possessives

    New Auto-Interp
    Negative Logits
    ת
    0.38
    د
    0.35
    zelfde
    0.34
    u
    0.34
    0.31
    ל
    0.31
    תה
    0.30
    ي
    0.29
    ع
    0.29
    0.29
    POSITIVE LOGITS
     are
    0.27
     
    0.26
     uprising
    0.22
     à
    0.22
    I
    0.21
     iodine
    0.21
     تا
    0.21
     de
    0.21
    0.21
    Heap
    0.21
    Act Density 0.012%

    No Known Activations