INDEX
    Explanations

    words related to causality or consequence

    the word "hence" in various contexts

    New Auto-Interp
    Negative Logits
     hitter
    -0.65
     nurs
    -0.64
    abies
    -0.64
    estation
    -0.63
    Bull
    -0.59
    %-
    -0.58
     batter
    -0.57
     Tasman
    -0.56
     battered
    -0.56
     Mehran
    -0.56
    POSITIVE LOGITS
    forth
    2.11
    forward
    1.45
    noon
    0.85
    oji
    0.78
    far
    0.77
     why
    0.77
    hua
    0.77
    alf
    0.76
    rely
    0.75
    ween
    0.75
    Act Density 0.014%

    No Known Activations