INDEX
    Explanations

    'the' followed by a noun

    New Auto-Interp
    Negative Logits
    ون
    0.53
    f
    0.53
    2
    0.53
    0.50
    ل
    0.50
    st
    0.49
    1
    0.48
    For
    0.47
    خدام
    0.46
    7
    0.44
    POSITIVE LOGITS
     to
    0.80
     is
    0.56
     on
    0.49
     it
    0.47
     oppure
    0.46
     helpen
    0.45
     at
    0.45
    <unused2213>
    0.45
     на
    0.42
    onu
    0.42
    Act Density 0.004%

    No Known Activations