INDEX
    Explanations

    components related to animals, particularly their physical features and characteristics

    New Auto-Interp
    Negative Logits
     face
    -0.20
    éĿ¢
    -0.18
     Face
    -0.18
    Face
    -0.17
     Tun
    -0.17
    tro
    -0.16
     Tub
    -0.16
    face
    -0.16
    -face
    -0.15
     Tro
    -0.15
    POSITIVE LOGITS
     tail
    0.85
    tail
    0.77
     Tail
    0.75
     tails
    0.71
    Tail
    0.71
    _tail
    0.64
    å°¾
    0.60
    .tail
    0.59
    tails
    0.59
    TAIL
    0.58
    Act Density 0.044%

    No Known Activations