INDEX
    Explanations

    mentions of specific names or proper nouns related to individuals

    New Auto-Interp
    Negative Logits
    er
    -0.27
    o
    -0.26
    y
    -0.20
    een
    -0.19
    ing
    -0.19
    oxy
    -0.18
    otic
    -0.17
    oq
    -0.16
    eria
    -0.16
    echn
    -0.16
    POSITIVE LOGITS
    ipeg
    0.26
    ings
    0.25
    sylvania
    0.23
    nn
    0.23
    ovation
    0.22
    ibal
    0.22
    ery
    0.22
    ounced
    0.21
    iversary
    0.21
    egan
    0.20
    Act Density 0.030%

    No Known Activations