INDEX
    Explanations

    phrases that indicate cause and effect relationships

    New Auto-Interp
    Negative Logits
     itſelf
    -1.36
     Efq
    -1.34
     myſelf
    -1.23
     houſe
    -1.22
     whoſe
    -1.21
     Anſ
    -1.19
     purpoſe
    -1.19
     ſtate
    -1.17
     Houſe
    -1.16
     himſelf
    -1.15
    POSITIVE LOGITS
     the
    1.64
     a
    1.18
     an
    1.09
     their
    0.88
     our
    0.85
     those
    0.84
     these
    0.81
     your
    0.81
     some
    0.80
     this
    0.80
    Act Density 1.455%

    No Known Activations