INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (ob
    -0.07
     Peter
    -0.07
    Peter
    -0.07
     drive
    -0.07
     Polo
    -0.07
     eos
    -0.06
     drummer
    -0.06
     сторону
    -0.06
     duke
    -0.06
     Amp
    -0.06
    POSITIVE LOGITS
    .Authentication
    0.07
     confess
    0.07
    0.07
    0.07
     confessed
    0.07
     Podesta
    0.07
     WTF
    0.06
    brace
    0.06
    Adresse
    0.06
    login
    0.06
    Act Density 0.004%

    No Known Activations