INDEX
    Explanations

    capital letters followed by a 'K'

    New Auto-Interp
    Negative Logits
    andon
    -0.16
    rott
    -0.15
    iego
    -0.15
     Contrast
    -0.14
    umann
    -0.14
    unas
    -0.14
     Cas
    -0.14
    μÏĢο
    -0.14
    qu
    -0.14
    ium
    -0.14
    POSITIVE LOGITS
     K
    0.20
    oen
    0.17
    oko
    0.14
    AKE
    0.14
    LM
    0.14
     k
    0.14
    atsu
    0.14
    inks
    0.14
    oser
    0.14
    kus
    0.14
    Act Density 0.084%

    No Known Activations