INDEX
    Explanations

    the word "our" with a high activation value

    possessive pronouns indicating ownership or belonging

    New Auto-Interp
    Negative Logits
     silent
    -0.61
     glitch
    -0.60
     missing
    -0.57
     haz
    -0.56
     nod
    -0.56
     crossover
    -0.56
     undead
    -0.55
     intellig
    -0.55
     losers
    -0.55
     loser
    -0.55
    POSITIVE LOGITS
    our
    4.54
    ours
    2.97
    ouring
    2.61
    oured
    2.54
    OUR
    2.54
    orous
    1.46
    orously
    1.41
    ourn
    1.37
    ourage
    1.29
    bour
    1.27
    Act Density 0.018%

    No Known Activations