INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -1.09
      
    -0.94
    -
    -0.91
    -0.88
    <eos>
    -0.85
     (
    -0.85
       
    -0.84
     The
    -0.76
     P
    -0.75
    ↵↵
    -0.74
    POSITIVE LOGITS
     myſelf
    1.92
     houſe
    1.84
     itſelf
    1.74
     Efq
    1.71
    ſelf
    1.67
     Reſ
    1.67
     Houſe
    1.66
     raiſ
    1.63
     purpoſe
    1.63
     Anſ
    1.62
    Act Density 1.117%

    No Known Activations