INDEX
    Explanations

    references to the human body and its characteristics

    New Auto-Interp
    Negative Logits
    avigator
    -0.16
    umber
    -0.15
    stin
    -0.15
    nable
    -0.15
    sti
    -0.15
    abelle
    -0.15
    enberg
    -0.15
    coh
    -0.14
    maj
    -0.14
    bove
    -0.14
    POSITIVE LOGITS
    guards
    0.23
    guard
    0.21
    weight
    0.20
    gren
    0.16
    wide
    0.16
    gaard
    0.16
    elter
    0.15
    558
    0.15
     pháºŃn
    0.15
    -body
    0.14
    Act Density 0.042%

    No Known Activations