INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Lie
    -0.08
     Puis
    -0.08
     machten
    -0.08
     hinted
    -0.08
     lia
    -0.07
    latent
    -0.07
     indicated
    -0.07
    lh
    -0.07
    grees
    -0.07
     culp
    -0.07
    POSITIVE LOGITS
     saline
    0.11
     socks
    0.09
    0.08
     Nava
    0.08
     Coco
    0.08
     uri
    0.08
     cobalt
    0.08
    arsu
    0.08
    .azure
    0.08
    (frm
    0.08
    Act Density 0.003%

    No Known Activations