INDEX
    Explanations

    code structures and syntactical elements

    New Auto-Interp
    Negative Logits
     purpoſe
    -1.04
     pleaſure
    -0.99
     Theſe
    -0.97
     myſelf
    -0.96
     uſe
    -0.94
     houſe
    -0.93
     iſt
    -0.93
     ſtate
    -0.92
     leaſt
    -0.92
     ſeveral
    -0.92
    POSITIVE LOGITS
     and
    0.99
    ,
    0.86
     or
    0.77
     in
    0.75
     which
    0.75
     as
    0.69
     at
    0.66
     for
    0.65
     also
    0.63
     on
    0.62
    Act Density 0.725%

    No Known Activations