INDEX
    Explanations

    causal relationships or explanations

    "Because" at the beginning of a sentence

    because introducing explanation

    New Auto-Interp
    Negative Logits
    ſelf
    -0.94
     ſeveral
    -0.78
     Majefty
    -0.77
     Efq
    -0.77
    ValueStyle
    -0.76
     himſelf
    -0.76
    ſelves
    -0.75
     myſelf
    -0.74
     ſta
    -0.73
     Houſe
    -0.73
    POSITIVE LOGITS
     they
    1.12
     it
    1.01
     we
    0.95
     Because
    0.85
     there
    0.84
    Because
    0.81
    ECAUSE
    0.80
     unlike
    0.78
     of
    0.77
    unlike
    0.76
    Act Density 0.095%

    No Known Activations