INDEX
    Explanations

    proper nouns, particularly names of individuals

    New Auto-Interp
    Negative Logits
    unda
    -0.63
    ge
    -0.62
    atively
    -0.60
    atical
    -0.57
    opsis
    -0.57
    gments
    -0.55
    ctors
    -0.53
    ctor
    -0.53
    arily
    -0.53
    gebra
    -0.53
    POSITIVE LOGITS
    hips
    0.79
    hip
    0.75
    hops
    0.72
    '
    0.65
    pring
    0.62
    mith
    0.62
    peed
    0.60
    hire
    0.59
    boro
    0.59
    heet
    0.59
    Act Density 7.533%

    No Known Activations