INDEX
    Explanations

    instances where the text discusses a singular, specific item or topic among a set of choices

    New Auto-Interp
    Negative Logits
    etz
    -0.80
    storms
    -0.74
    des
    -0.73
    gnu
    -0.69
    redits
    -0.69
    illus
    -0.68
    invoke
    -0.68
    skirts
    -0.68
    mire
    -0.66
    ruary
    -0.66
    POSITIVE LOGITS
     thing
    1.36
     conceivable
    1.24
     reason
    1.24
     exception
    1.20
     way
    1.19
     remaining
    1.17
     viable
    1.13
     downside
    1.11
     sane
    1.10
     difference
    1.09
    Act Density 0.051%

    No Known Activations