INDEX
    Explanations

    body parts, concepts, and behaviors

    New Auto-Interp
    Negative Logits
     a
    1.10
     
    0.79
    i
    0.77
    an
    0.75
     an
    0.70
    lN
    0.63
    ta
    0.62
    dana
    0.61
    k
    0.59
    as
    0.59
    POSITIVE LOGITS
    0.85
     and
    0.81
    ни
    0.76
     are
    0.75
    у
    0.75
     και
    0.74
    ми
    0.70
     де
    0.68
    с
    0.68
     políticos
    0.68
    Act Density 3.541%

    No Known Activations