INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ries
    -0.91
    -0.61
     ri
    -0.57
    ried
    -0.55
    <eos>
    -0.52
     pre
    -0.50
    RIES
    -0.50
    devtools
    -0.49
    rying
    -0.48
     out
    -0.48
    POSITIVE LOGITS
     myſelf
    1.05
     pleaſure
    1.03
    ſelf
    1.00
    ſelves
    0.98
     themſelves
    0.94
     purpoſe
    0.94
     ſche
    0.93
     houſe
    0.91
     estekak
    0.91
     Monfieur
    0.91
    Act Density 0.081%

    No Known Activations