INDEX
    Explanations

    quotes and descriptions

    New Auto-Interp
    Negative Logits
    <h3>
    1.45
    <h2>
    1.33
    <h4>
    1.22
    <blockquote>
    1.11
    Read
    1.05
    READ
    1.04
    <h1>
    1.02
    Reading
    0.99
    <h5>
    0.98
    Each
    0.98
    POSITIVE LOGITS
     looks
    1.31
     indeed
    1.21
     Looks
    1.13
     surely
    1.12
     undoubtedly
    1.03
    1.02
    looks
    1.00
     ends
    1.00
     Indeed
    0.98
     “…
    0.95
    Act Density 0.001%

    No Known Activations