INDEX
    Explanations

    the word "legs" and words that can be associated with body parts

    New Auto-Interp
    Negative Logits
    -1.55
     the
    -1.40
     in
    -1.34
     a
    -1.20
    ,
    -1.14
    ↵↵
    -1.09
     as
    -1.08
     and
    -1.05
     of
    -1.04
     an
    -1.03
    POSITIVE LOGITS
     متعلقه
    1.87
     Theſe
    1.77
     myſelf
    1.74
     Monfieur
    1.69
     auffi
    1.68
     pleaſure
    1.66
     ſche
    1.65
     Reſ
    1.65
     iſt
    1.63
     itſelf
    1.63
    Act Density 1.301%

    No Known Activations