INDEX
Explanations
expressions of familial relationships and affection
New Auto-Interp
Negative Logits
Actress
-0.18
Woman
-0.17
mistress
-0.17
lady
-0.17
osit
-0.17
woman
-0.17
woman
-0.16
Lady
-0.16
Lady
-0.15
pregnant
-0.15
POSITIVE LOGITS
dad
0.65
Dad
0.63
father
0.59
dads
0.59
Father
0.57
fathers
0.56
daddy
0.54
dad
0.52
Fathers
0.52
Father
0.51
Activations Density 0.066%