INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Efq
    -1.01
     Houſe
    -1.00
     Jefus
    -0.99
     myſelf
    -0.97
     purpoſe
    -0.96
     itſelf
    -0.94
     ſta
    -0.93
     houſe
    -0.92
    jspx
    -0.91
     poffe
    -0.91
    POSITIVE LOGITS
    ↵↵
    0.60
    '
    0.58
     D
    0.50
    a
    0.49
    /
    0.47
    .
    0.47
    i
    0.45
    <eos>
    0.45
     to
    0.44
     and
    0.44
    Act Density 0.019%

    No Known Activations