INDEX
    Explanations

    words related to time, specifically when something happens or in what sequence

    instances of causal relationships or conditional statements

    New Auto-Interp
    Negative Logits
    Indeed
    -0.78
    Consider
    -0.72
     Consider
    -0.71
     Principles
    -0.70
    atories
    -0.66
    Yet
    -0.64
    ashington
    -0.63
     Indeed
    -0.63
    ģĸ
    -0.63
    virt
    -0.62
    POSITIVE LOGITS
     haha
    1.01
     everybody
    0.96
     I
    0.95
     somebody
    0.95
     you
    0.94
     guys
    0.93
     didnt
    0.92
     guy
    0.88
     he
    0.85
     stuff
    0.84
    Act Density 0.493%

    No Known Activations