INDEX
    Explanations

    phrases indicating a contrast or contradiction

    the word "how" in various contexts

    New Auto-Interp
    Negative Logits
    uthor
    -0.65
    agonists
    -0.63
     Grail
    -0.62
    lehem
    -0.61
     Guer
    -0.58
    iculture
    -0.57
    ception
    -0.57
    agonist
    -0.56
    isher
    -0.56
     Yard
    -0.56
    POSITIVE LOGITS
    soever
    1.08
    HCR
    0.86
    beit
    0.86
    ever
    0.82
    ls
    0.82
    ling
    0.81
    ells
    0.77
    itzer
    0.76
     much
    0.75
     MUCH
    0.74
    Act Density 0.084%

    No Known Activations