INDEX
    Explanations

    proper nouns, particularly names or brands

    New Auto-Interp
    Negative Logits
    eling
    -0.19
    elli
    -0.18
    idon
    -0.17
    elly
    -0.17
    elle
    -0.17
    elson
    -0.16
    es
    -0.15
    ese
    -0.15
    ellan
    -0.15
    et
    -0.15
    POSITIVE LOGITS
    entine
    0.28
    entina
    0.26
    uable
    0.25
    uation
    0.24
    leys
    0.23
    entin
    0.23
    uations
    0.23
    uetype
    0.23
     val
    0.22
    =val
    0.21
    Act Density 0.025%

    No Known Activations