INDEX
    Explanations

    references to humans and their interactions with other beings or entities

    New Auto-Interp
    Negative Logits
    iverz
    -0.17
    eyse
    -0.17
    icast
    -0.15
    jours
    -0.15
    uzey
    -0.15
    agnar
    -0.15
    arness
    -0.15
    ork
    -0.15
    allah
    -0.14
    agn
    -0.14
    POSITIVE LOGITS
     human
    0.63
     humans
    0.59
    human
    0.50
     Humans
    0.49
     Human
    0.48
    -human
    0.47
    Human
    0.44
    人类
    0.43
    Humans
    0.40
    _human
    0.40
    Act Density 0.170%

    No Known Activations