INDEX
    Explanations

    proper nouns, specifically names of notable individuals

    New Auto-Interp
    Negative Logits
     nonzero
    -0.19
    alem
    -0.14
     mentioning
    -0.14
     Afr
    -0.14
     â̦
    -0.14
     and
    -0.14
     mentioned
    -0.14
    aje
    -0.13
     incl
    -0.13
     [
    -0.13
    POSITIVE LOGITS
    itbart
    0.18
    izontal
    0.17
     adipiscing
    0.17
    ikal
    0.16
    κÏĮ
    0.16
    stalk
    0.15
    ÏİÏģα
    0.15
    ardin
    0.15
    avo
    0.15
    bserv
    0.14
    Act Density 0.000%

    No Known Activations