INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     qualitative
    -0.08
     closets
    -0.08
    -0.08
     qualit
    -0.07
     closet
    -0.07
    astra
    -0.07
    -0.07
    qondo
    -0.07
    .dr
    -0.07
     describe
    -0.07
    POSITIVE LOGITS
    9
    0.09
    441
    0.08
     quatre
    0.08
     cuatro
    0.08
    44
    0.08
     quattro
    0.08
    6
    0.08
    429
    0.08
    305
    0.08
    steam
    0.08
    Act Density 0.048%

    No Known Activations