INDEX
    Explanations

    proper nouns, particularly names of people

    New Auto-Interp
    Negative Logits
    pid
    -0.17
    eec
    -0.15
    lli
    -0.15
    vement
    -0.15
    orrow
    -0.15
    ãĥªãĥ¼ãĤº
    -0.14
    ="__
    -0.14
    allon
    -0.14
    utility
    -0.14
    tery
    -0.14
    POSITIVE LOGITS
    mann
    0.41
    berg
    0.33
    inger
    0.30
    acher
    0.29
    berger
    0.29
    hammer
    0.29
    heimer
    0.28
    auer
    0.28
    feld
    0.28
    me
    0.27
    Act Density 0.213%

    No Known Activations