INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Ram
    -0.07
    Simon
    -0.07
     McCain
    -0.06
    -0.06
    -0.06
    iyim
    -0.06
     Ram
    -0.06
    464
    -0.06
    -0.06
     Mozart
    -0.06
    POSITIVE LOGITS
     unfavor
    0.08
    <Menu
    0.07
    resh
    0.07
    .paused
    0.07
     weir
    0.06
     (↵↵
    0.06
     jj
    0.06
    0.06
    ffi
    0.06
     strang
    0.06
    Act Density 0.003%

    No Known Activations