INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itſelf
    -1.06
     pleaſure
    -1.00
     fubject
    -1.00
     purpoſe
    -0.99
     myſelf
    -0.94
     themſelves
    -0.87
     himſelf
    -0.87
     Monfieur
    -0.87
     juſt
    -0.85
    évaluateur
    -0.83
    POSITIVE LOGITS
    satunya
    0.74
    <eos>
    0.64
     other
    0.59
     The
    0.56
     enige
    0.54
     *__
    0.53
     Is
    0.51
     What
    0.51
    ogen
    0.50
     Ge
    0.50
    Act Density 0.055%

    No Known Activations