INDEX
    Explanations

    terms related to physical body parts

    New Auto-Interp
    Negative Logits
     mask
    -0.62
    ween
    -0.59
     Niet
    -0.59
    flush
    -0.59
     masks
    -0.59
     elig
    -0.58
     Homo
    -0.58
     fer
    -0.57
     awake
    -0.57
     Woodward
    -0.56
    POSITIVE LOGITS
    ageddon
    1.43
    aceutical
    1.37
    ovie
    1.11
    ament
    1.04
    ichael
    1.03
    essage
    1.02
    ony
    1.02
    achine
    0.98
    strong
    0.92
    onica
    0.92
    Act Density 0.014%

    No Known Activations