INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Gen
    -0.08
     stroke
    -0.08
     ges
    -0.07
    PT
    -0.07
    Stroke
    -0.07
     Gen
    -0.07
    stamp
    -0.07
     gede
    -0.07
    rub
    -0.07
    stroke
    -0.07
    POSITIVE LOGITS
     waitress
    0.08
    tern
    0.08
     agrad
    0.08
     Elaine
    0.08
     தர
    0.08
    .ev
    0.08
     காவ
    0.07
     ага
    0.07
     Fiona
    0.07
     Madonna
    0.07
    Act Density 0.001%

    No Known Activations