INDEX
    Explanations

    mentions of the word "wer" with varying levels of activation

    the word "wer" in various forms, signaling a focus on variations of that term

    New Auto-Interp
    Negative Logits
     dogs
    -0.64
     ghosts
    -0.64
     PTSD
    -0.64
     accompanying
    -0.64
     Jinping
    -0.62
     makeup
    -0.62
     parenting
    -0.61
     esp
    -0.60
    âĹı
    -0.60
     paramedics
    -0.60
    POSITIVE LOGITS
    wer
    4.76
    WER
    1.65
    wered
    1.31
    wark
    1.18
     Wer
    1.14
    ws
    1.12
    swer
    1.11
    wen
    1.10
    wed
    1.04
    w
    1.04
    Act Density 0.011%

    No Known Activations