INDEX
    Explanations

    names of individuals

    proper nouns, specifically names

    New Auto-Interp
    Negative Logits
     lawy
    -0.68
    Reviewer
    -0.67
    irlf
    -0.66
     Flavoring
    -0.66
    glers
    -0.64
    withstanding
    -0.62
    */(
    -0.62
    avorite
    -0.61
    footed
    -0.61
    cause
    -0.60
    POSITIVE LOGITS
    ette
    0.75
     Wynne
    0.73
    idge
    0.71
    enne
    0.71
    atis
    0.68
    ettes
    0.67
    opa
    0.65
    gain
    0.65
    ello
    0.64
    illo
    0.64
    Act Density 0.091%

    No Known Activations