INDEX
    Explanations

    phrases indicating cause and effect

    connecting words that indicate causation or consequence

    New Auto-Interp
    Negative Logits
    quit
    -0.77
    RAW
    -0.68
    ctions
    -0.65
     Swim
    -0.65
    wait
    -0.64
    wl
    -0.64
    rete
    -0.63
    cit
    -0.63
     boarded
    -0.63
    Submit
    -0.63
    POSITIVE LOGITS
     preventing
    1.66
     reducing
    1.66
     enabling
    1.59
     facilitating
    1.55
     allowing
    1.55
     enhancing
    1.54
     ensuring
    1.53
     eliminating
    1.50
     boosting
    1.49
     preserving
    1.48
    Act Density 0.255%

    No Known Activations