INDEX
    Explanations

    proper nouns and specific names

    New Auto-Interp
    Negative Logits
     Hob
    -0.15
    ụn
    -0.14
     Zhu
    -0.12
    رÙĪØ¯
    -0.12
    .zh
    -0.11
     Elsa
    -0.11
     Tibetan
    -0.11
    krv
    -0.10
    ë¶
    -0.10
     Spokane
    -0.10
    POSITIVE LOGITS
     Cox
    0.69
     ox
    0.66
    ox
    0.65
     Sax
    0.64
     CX
    0.64
     Rex
    0.63
     Pax
    0.62
     FX
    0.60
     Lexington
    0.60
     Rox
    0.60
    Act Density 1.272%

    No Known Activations