INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    confirmed
    -0.07
     measures
    -0.07
     funded
    -0.07
     narrowly
    -0.07
    Surname
    -0.07
     Düz
    -0.07
     scrim
    -0.06
     ASA
    -0.06
     Alternate
    -0.06
     boob
    -0.06
    POSITIVE LOGITS
    0.07
     yerine
    0.07
    vro
    0.06
    aman
    0.06
     port
    0.06
     ">
    0.06
    0.06
    0.06
     ironically
    0.06
    ,this
    0.06
    Act Density 0.008%

    No Known Activations