INDEX
    Explanations

    phrases indicating partial reasons or contributions

    New Auto-Interp
    Negative Logits
     mostly
    -0.23
     mainly
    -0.21
     solely
    -0.20
     primarily
    -0.19
    sole
    -0.19
    inkel
    -0.18
    mostly
    -0.17
     sole
    -0.17
     either
    -0.17
     principalmente
    -0.17
    POSITIVE LOGITS
     due
    0.21
     because
    0.19
     Due
    0.19
    due
    0.19
     responsible
    0.18
    çͱäºİ
    0.17
    Because
    0.17
     Because
    0.16
    because
    0.16
    Due
    0.16
    Act Density 0.035%

    No Known Activations