INDEX
    Explanations

    the word "because" and its variations to identify reasoning or justification in statements

    New Auto-Interp
    Negative Logits
    符
    -0.17
    omnia
    -0.16
     Moran
    -0.14
    -Sah
    -0.14
    ewart
    -0.14
    ailed
    -0.14
    imenti
    -0.14
    ples
    -0.14
     Sle
    -0.14
     åº
    -0.13
    POSITIVE LOGITS
    upo
    0.16
    аÑĢÑĩ
    0.16
    uni
    0.15
    usra
    0.15
    ilk
    0.14
    eton
    0.14
    retry
    0.14
    erno
    0.13
    reet
    0.13
    ania
    0.13
    Act Density 0.042%

    No Known Activations