INDEX
    Explanations

    the word "because" and its variations, indicating causal explanations or reasons

    New Auto-Interp
    Negative Logits
    them
    -0.19
    ively
    -0.16
    ingt
    -0.15
    ERCHANT
    -0.15
    и
    -0.15
    him
    -0.15
    tron
    -0.15
    ors
    -0.15
    ship
    -0.14
     pues
    -0.14
    POSITIVE LOGITS
     they
    0.24
     unlike
    0.24
     there
    0.23
     latter
    0.22
     nothing
    0.22
     nobody
    0.22
     we
    0.20
     it
    0.20
     although
    0.19
     no
    0.19
    Act Density 0.094%

    No Known Activations