INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    -0.08
     weaponry
    -0.07
     Anthony
    -0.07
    Anthony
    -0.07
    ATEG
    -0.07
     DEFIN
    -0.06
     ضد
    -0.06
     Manny
    -0.06
    От
    -0.06
    uide
    -0.06
    POSITIVE LOGITS
     girl
    0.18
     Girl
    0.17
     Girls
    0.16
     girls
    0.16
    Girl
    0.13
    girl
    0.12
    Girls
    0.12
    -girl
    0.12
    girls
    0.11
     Gir
    0.09
    Act Density 0.021%

    No Known Activations