INDEX
    Explanations

    words related to issues, problems, or potential threats

    occurrences of the word "were."

    New Auto-Interp
    Negative Logits
    dom
    -0.65
     Matter
    -0.65
    iates
    -0.64
     Defeat
    -0.62
     Raise
    -0.62
    otic
    -0.62
    ledge
    -0.61
    oire
    -0.61
    place
    -0.58
    lag
    -0.58
    POSITIVE LOGITS
    wolves
    1.53
    wolf
    1.29
     able
    0.97
    nt
    0.96
     supposed
    0.88
     instrumental
    0.87
    hes
    0.85
     originally
    0.85
    hers
    0.84
     greeted
    0.83
    Act Density 0.223%

    No Known Activations