INDEX
    Explanations

    phrases indicating causation or reasoning

    New Auto-Interp
    Negative Logits
    unächst
    -0.71
    SharedDtor
    -0.67
    OrBuilder
    -0.63
    Демографія
    -0.62
    rungsseite
    -0.60
     iNdEx
    -0.59
     langkah
    -0.59
    comuna
    -0.59
    ovací
    -0.59
    pośred
    -0.57
    POSITIVE LOGITS
    Because
    1.24
     because
    1.23
    because
    1.18
     Because
    1.16
     BECAUSE
    1.06
     karena
    1.05
     Karena
    0.96
     Sebab
    0.94
    因为
    0.93
     wegen
    0.93
    Act Density 1.552%

    No Known Activations