INDEX
    Explanations

    words or phrases related to reasons or causes

    instances of the word "because" indicating causal relationships

    New Auto-Interp
    Negative Logits
    mint
    -0.74
    wn
    -0.71
    yan
    -0.71
    Gas
    -0.70
    alin
    -0.70
    agin
    -0.69
    lem
    -0.69
    ries
    -0.67
    ymph
    -0.66
    âĤ¬
    -0.65
    POSITIVE LOGITS
     they
    0.86
    */(
    0.79
    ecause
    0.69
     otherwise
    0.67
     someone
    0.67
     THEY
    0.66
    akening
    0.65
    ority
    0.65
     mistakenly
    0.64
     there
    0.64
    Act Density 0.080%

    No Known Activations