INDEX
Explanations
connections or relationships between concepts and items
New Auto-Interp
Negative Logits
;width
-0.23
wrists
-0.20
Women
-0.19
wings
-0.19
,width
-0.18
weights
-0.18
*width
-0.18
ÙĪØ¹
-0.18
Woman
-0.18
Wikipedia
-0.18
POSITIVE LOGITS
unw
0.21
girl
0.18
girls
0.18
height
0.18
girls
0.17
height
0.17
Girl
0.16
Girls
0.16
girl
0.16
monthly
0.15
Activations Density 0.168%