INDEX
    Explanations

    prominent female figures or references to women

    New Auto-Interp
    Negative Logits
    oste
    -0.15
    ova
    -0.14
     Picks
    -0.13
     Sala
    -0.13
    epar
    -0.13
    wers
    -0.13
    à¹Ģà¸Ĺศ
    -0.13
    ALLE
    -0.13
    nej
    -0.13
    atsu
    -0.13
    POSITIVE LOGITS
     said
    0.16
    empo
    0.16
    ãĥ¼ãĤ¿ãĥ¼
    0.15
    SCALL
    0.15
    pto
    0.14
    bons
    0.14
    odium
    0.14
    λία
    0.14
    zza
    0.14
    ãģ«ãĤĪ
    0.14
    Act Density 0.380%

    No Known Activations