INDEX
    Explanations

    references to temporal relationships and conditions in a context

    New Auto-Interp
    Negative Logits
     as
    -0.78
     all
    -0.74
     get
    -0.73
     in
    -0.72
     where
    -0.71
     no
    -0.69
     can
    -0.68
     for
    -0.68
     a
    -0.68
     is
    -0.66
    POSITIVE LOGITS
    also
    1.36
    then
    1.33
    now
    1.27
    been
    1.27
    there
    1.26
    because
    1.25
    since
    1.24
     itſelf
    1.23
    again
    1.23
     Theſe
    1.23
    Act Density 0.359%

    No Known Activations