INDEX
    Explanations

    instances of the word "one"

    New Auto-Interp
    Negative Logits
    rosse
    -0.08
    ffi
    -0.07
    inois
    -0.07
    aney
    -0.07
    roid
    -0.06
    IBC
    -0.06
    anca
    -0.06
    chten
    -0.06
    ruk
    -0.06
    pery
    -0.06
    POSITIVE LOGITS
     of
    0.08
    cle
    0.07
     among
    0.06
    ixture
    0.06
     Kew
    0.06
    woke
    0.06
     Cad
    0.06
    among
    0.05
     Zu
    0.05
     favorite
    0.05
    Act Density 0.013%

    No Known Activations