INDEX
    Explanations

    phrases indicating causation or reasons for consequences

    New Auto-Interp
    Negative Logits
    ewith
    -0.15
    idi
    -0.15
    usa
    -0.14
    akov
    -0.14
    idis
    -0.14
    haven
    -0.13
     Prefer
    -0.13
    ux
    -0.13
    auce
    -0.13
    ÙĨاء
    -0.13
    POSITIVE LOGITS
     partially
    0.35
     partly
    0.34
     party
    0.28
     least
    0.24
    part
    0.23
     largely
    0.22
     Party
    0.22
     atleast
    0.22
     جزئ
    0.21
     Partial
    0.21
    Act Density 0.073%

    No Known Activations