INDEX
    Explanations

    punctuation marks and formatting symbols in the text

    New Auto-Interp
    Negative Logits
     greateſt
    -1.13
     itſelf
    -1.10
     purpoſe
    -1.02
     pleaſure
    -1.01
     myſelf
    -1.00
     themſelves
    -0.99
     ſever
    -0.98
     fubject
    -0.96
     Reſ
    -0.96
     ſind
    -0.94
    POSITIVE LOGITS
    ↵↵
    0.96
    ↵↵↵
    0.77
     The
    0.67
    </h3>
    0.65
    </blockquote>
    0.64
    0.60
     "
    0.58
    ↵↵↵↵
    0.57
     or
    0.55
    )
    0.55
    Act Density 0.586%

    No Known Activations