INDEX
    Explanations

    references to the concept of "humans" and their qualities or conditions

    New Auto-Interp
    Negative Logits
     kệ
    -0.42
     respective
    -0.38
     distinción
    -0.38
    ท้าย
    -0.37
    illage
    -0.36
    IKI
    -0.36
     Besten
    -0.35
    ilaire
    -0.35
    tilles
    -0.35
    retum
    -0.35
    POSITIVE LOGITS
    Human
    1.16
     Human
    1.11
    human
    1.10
    HUMAN
    1.00
     human
    0.99
     HUMAN
    0.94
     Humans
    0.90
    Humans
    0.90
    humans
    0.82
     umani
    0.81
    Act Density 0.108%

    No Known Activations