INDEX
    Explanations

    references to individuals, particularly focusing on names and titles

    New Auto-Interp
    Negative Logits
    hift
    -0.18
    ingroup
    -0.16
    oux
    -0.16
    yaml
    -0.16
    catch
    -0.15
    owo
    -0.15
    ython
    -0.15
    heet
    -0.14
    è¡£
    -0.14
    root
    -0.14
    POSITIVE LOGITS
    eam
    0.17
    udiant
    0.15
    å¥ĩ
    0.15
    éļİ
    0.15
    à¥ĩà¤ĸ
    0.15
    ulton
    0.15
    alc
    0.14
    ÅĽnie
    0.14
    phant
    0.14
    spb
    0.14
    Act Density 0.031%

    No Known Activations