INDEX
    Explanations

    the word "her" with high activation

    the word "her" in various contexts

    New Auto-Interp
    Negative Logits
     Reconstruction
    -0.69
     Governments
    -0.61
     cancell
    -0.56
     TTC
    -0.55
     stopp
    -0.55
     Enhance
    -0.53
     Origins
    -0.53
     tilt
    -0.53
    CVE
    -0.53
     Quarterly
    -0.52
    POSITIVE LOGITS
    her
    4.74
    hers
    3.26
    HER
    2.45
    hes
    2.08
    hest
    2.02
    hed
    1.72
    hel
    1.68
    hen
    1.63
    heres
    1.61
    here
    1.56
    Act Density 0.014%

    No Known Activations