INDEX
    Explanations

    sentence beginnings indicating question, calculation, or assumption

    Questions and instructions

    New Auto-Interp
    Negative Logits
    <bos>
    -1.62
     itſelf
    -1.23
     myſelf
    -1.09
    ſelf
    -1.08
     Jefus
    -1.00
     Efq
    -1.00
     himſelf
    -0.94
     themſelves
    -0.91
     pleaſure
    -0.91
     enfans
    -0.90
    POSITIVE LOGITS
    ?
    0.59
    ,
    0.57
     (
    0.56
     dar
    0.56
     =
    0.56
    ;
    0.56
    :
    0.56
    )
    0.54
     un
    0.53
    (
    0.52
    Act Density 3.755%

    No Known Activations