INDEX
    Explanations

    conjunctions and pronouns

    New Auto-Interp
    Negative Logits
    ،
    0.43
    0.42
    0.38
    .
    0.38
    0.37
     ،
    0.37
     ,
    0.34
    ۔
    0.28
    \",
    0.28
    0.27
    POSITIVE LOGITS
     we
    0.43
     it
    0.38
     when
    0.33
     they
    0.32
     although
    0.32
    we
    0.32
     ketika
    0.32
    when
    0.31
     aunque
    0.29
     you
    0.28
    Act Density 0.128%

    No Known Activations