INDEX
    Explanations

    names of historical figures

    New Auto-Interp
    Negative Logits
     bothering
    -0.72
    higher
    -0.66
    Untitled
    -0.65
    Ĥ¬
    -0.63
     pregn
    -0.62
    thinking
    -0.61
    intent
    -0.60
    veyard
    -0.60
    necessary
    -0.59
    giving
    -0.59
    POSITIVE LOGITS
    'll
    1.13
     underwent
    1.02
     participated
    1.01
     graduated
    1.01
    'd
    1.00
    pherd
    1.00
     oversaw
    1.00
     survived
    0.99
     earns
    0.96
     scored
    0.96
    Act Density 0.355%

    No Known Activations