INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Moj
    -0.19
    bsp
    -0.16
    oru
    -0.15
    alice
    -0.15
    utow
    -0.15
    ubl
    -0.15
    ãĥĭãĥĥãĤ¯
    -0.14
    ortal
    -0.14
    rio
    -0.14
    inyin
    -0.14
    POSITIVE LOGITS
    Ze
    0.34
     Zealand
    0.33
     Ze
    0.33
     ze
    0.30
    ze
    0.25
    ZE
    0.24
     zeal
    0.23
    -Z
    0.23
     Guinea
    0.22
    stalk
    0.22
    Act Density 0.009%

    No Known Activations