INDEX
    Explanations

    the word "because" indicating reasons or causes within the text

    New Auto-Interp
    Negative Logits
    rai
    -0.16
    hani
    -0.15
    adel
    -0.15
    idan
    -0.15
    uj
    -0.14
    enders
    -0.14
    γÎŃν
    -0.14
    à¸Ĭà¸Ļ
    -0.14
    ucc
    -0.14
    rani
    -0.14
    POSITIVE LOGITS
     of
    0.48
     cá»§a
    0.30
    of
    0.23
    à¸Ĥà¸Ńà¸ĩ
    0.19
     they
    0.18
    á»§a
    0.18
    _of
    0.18
     ÏĦηÏĤ
    0.17
     ÏĦÏīν
    0.17
    	of
    0.17
    Act Density 0.057%

    No Known Activations