INDEX
    Explanations

    relative clauses introduced by which

    New Auto-Interp
    Negative Logits
    ،
    0.43
    0.31
    0.29
    0.24
    0.22
    、$
    0.22
    0.22
     perverse
    0.21
     ،
    0.21
     درون
    0.21
    POSITIVE LOGITS
    which
    0.37
    as
    0.35
     რომელიც
    0.34
     which
    0.34
     который
    0.34
     který
    0.33
     které
    0.32
     которая
    0.32
     które
    0.31
     которое
    0.30
    Act Density 0.835%

    No Known Activations