INDEX
    Explanations

    signals indicating causation or explanation

    the word "because" used to introduce explanations or reasons

    New Auto-Interp
    Negative Logits
    uttered
    -0.71
    intern
    -0.71
    lem
    -0.69
     exting
    -0.67
    ée
    -0.67
    ymph
    -0.61
    Gas
    -0.60
    abal
    -0.60
    SPONSORED
    -0.58
     pione
    -0.58
    POSITIVE LOGITS
    rely
    1.08
     of
    0.72
     OF
    0.67
     nobody
    0.67
     there
    0.66
     humans
    0.63
     we
    0.63
     these
    0.62
    Of
    0.61
     hindsight
    0.61
    Act Density 0.063%

    No Known Activations