INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Chicago
    -0.16
    inho
    -0.16
    asil
    -0.15
     Chicago
    -0.15
    ahoma
    -0.15
    Py
    -0.14
     Boulder
    -0.14
     chicago
    -0.14
    PY
    -0.14
     Missouri
    -0.14
    POSITIVE LOGITS
     NB
    0.32
     Shed
    0.27
    NB
    0.27
     Freder
    0.26
     Edmund
    0.25
     Nack
    0.24
     Mir
    0.24
    506
    0.23
    .nb
    0.21
    nb
    0.21
    Act Density 0.049%

    No Known Activations