INDEX
    Explanations

    copying data

    New Auto-Interp
    Negative Logits
     bieten
    -0.07
     LGBTQ
    -0.07
    Rejected
    -0.07
     وسي
    -0.07
     Makeup
    -0.07
     One
    -0.07
     Patel
    -0.07
     Offering
    -0.07
     Mechan
    -0.07
     Merchant
    -0.07
    POSITIVE LOGITS
    —from
    0.08
     Osman
    0.08
     hacia
    0.08
     almoh
    0.07
     silic
    0.07
     mellan
    0.07
     universidades
    0.07
    0.07
     world's
    0.07
     unto
    0.07
    Act Density 0.003%

    No Known Activations