INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     I
    -0.68
    -0.60
    ↵↵
    -0.55
     "
    -0.52
    <eos>
    -0.51
     an
    -0.50
     a
    -0.48
    ,
    -0.47
     you
    -0.46
     las
    -0.45
    POSITIVE LOGITS
    ſelves
    1.48
     Houſe
    1.44
     myſelf
    1.44
     Efq
    1.43
     ſtate
    1.40
     itſelf
    1.38
    ſelf
    1.34
     Reſ
    1.34
     Anſ
    1.34
     purpoſe
    1.32
    Act Density 0.095%

    No Known Activations