INDEX
    Explanations

    the word "the" with a high activation

    instances of the word "the."

    New Auto-Interp
    Negative Logits
    anni
    -0.80
    lich
    -0.78
    -+-+
    -0.78
    ambo
    -0.76
    alde
    -0.75
    cade
    -0.72
    tu
    -0.72
    den
    -0.71
    alloc
    -0.69
    ploy
    -0.68
    POSITIVE LOGITS
     plunge
    1.42
     brunt
    1.31
     initiative
    1.16
     bait
    1.16
     reins
    1.16
     opportunity
    1.14
     helm
    1.13
     liberty
    1.13
     precaution
    1.05
     leap
    1.03
    Act Density 0.044%

    No Known Activations