INDEX
    Explanations

    proper nouns, specifically names and titles

    New Auto-Interp
    Negative Logits
    ä
    -0.17
    çķ
    -0.17
    é
    -0.16
    lu
    -0.16
    adam
    -0.15
    yal
    -0.15
    /Foundation
    -0.15
    tah
    -0.15
    tent
    -0.15
    l
    -0.15
    POSITIVE LOGITS
     arcs
    0.17
    csi
    0.16
    ció
    0.15
    asan
    0.15
    conomy
    0.15
    cs
    0.14
    plete
    0.14
    lyn
    0.14
    iag
    0.14
    gb
    0.14
    Act Density 0.002%

    No Known Activations