INDEX
    Explanations

    specific words that refer to actions or objects, particularly nouns

    conditional phrases suggesting dependence or consequence

    New Auto-Interp
    Negative Logits
     AMA
    -0.54
     Medals
    -0.54
     contrad
    -0.54
     Caucus
    -0.52
     rhet
    -0.52
     Haas
    -0.52
     Hung
    -0.51
     disav
    -0.51
     ethn
    -0.50
    -0.50
    POSITIVE LOGITS
    pires
    0.82
     accompanies
    0.79
    caster
    0.68
    itiveness
    0.67
    older
    0.67
    ivalry
    0.64
    pired
    0.64
    wegian
    0.63
    Fast
    0.63
    OULD
    0.62
    Act Density 0.942%

    No Known Activations