INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     feroit
    -1.10
     auroit
    -1.07
     Houſe
    -1.02
     pouvoit
    -1.02
     myſelf
    -1.01
     ſtate
    -1.00
     ainfi
    -1.00
     houſe
    -1.00
     Monfieur
    -0.98
     greateſt
    -0.96
    POSITIVE LOGITS
     the
    0.69
    ↵↵
    0.57
    ary
    0.56
    .
    0.56
    0.56
     ne
    0.54
    0.54
     I
    0.54
     (
    0.53
    ally
    0.53
    Act Density 0.759%

    No Known Activations