INDEX
    Explanations

    proper nouns, particularly names of individuals

    New Auto-Interp
    Negative Logits
    umlu
    -0.19
    undos
    -0.18
    tog
    -0.16
    ization
    -0.15
    /or
    -0.15
    orp
    -0.15
    rops
    -0.14
    /vnd
    -0.14
     baiser
    -0.14
    theless
    -0.14
    POSITIVE LOGITS
    boy
    0.17
    boys
    0.16
    ãĢħ
    0.15
    ιÏĩ
    0.15
    æ¸Ī
    0.14
    sik
    0.14
    elow
    0.13
    Variant
    0.13
    sy
    0.13
    erea
    0.13
    Act Density 0.287%

    No Known Activations