INDEX
    Explanations

    references to the word "girl" in various forms and contexts

    New Auto-Interp
    Negative Logits
    -uri
    -0.15
    ÑıÑħ
    -0.15
    ined
    -0.15
    Ïīμα
    -0.15
    mark
    -0.15
     fog
    -0.15
    lom
    -0.14
     Wayback
    -0.14
    cx
    -0.14
    racat
    -0.14
    POSITIVE LOGITS
    affe
    0.36
    aff
    0.28
    AFF
    0.23
    affer
    0.22
     Gir
    0.21
    ardin
    0.21
    oux
    0.20
    friend
    0.20
    aud
    0.20
     gir
    0.20
    Act Density 0.005%

    No Known Activations