INDEX
Explanations
components related to animals, particularly their physical features and characteristics
New Auto-Interp
Negative Logits
face
-0.20
éĿ¢
-0.18
Face
-0.18
Face
-0.17
Tun
-0.17
tro
-0.16
Tub
-0.16
face
-0.16
-face
-0.15
Tro
-0.15
POSITIVE LOGITS
tail
0.85
tail
0.77
Tail
0.75
tails
0.71
Tail
0.71
_tail
0.64
å°¾
0.60
.tail
0.59
tails
0.59
TAIL
0.58
Activations Density 0.044%