INDEX
    Explanations

    things that are followed by citations, examples or code

    New Auto-Interp
    Negative Logits
     myſelf
    -1.07
     Efq
    -1.02
     Monfieur
    -1.00
     raiſ
    -0.97
     houſe
    -0.97
     themſelves
    -0.97
     whoſe
    -0.94
     ſever
    -0.94
     vPvB
    -0.94
    tvguidetime
    -0.93
    POSITIVE LOGITS
    ,
    0.85
     they
    0.56
     we
    0.54
     it
    0.53
     main
    0.51
    .
    0.49
     she
    0.47
     Min
    0.47
    :
    0.47
    ;
    0.47
    Act Density 0.599%

    No Known Activations