INDEX
Explanations
proper nouns related to female figures
references to the term "lady" in various contexts
New Auto-Interp
Negative Logits
CS
-0.70
Blocks
-0.64
Ratings
-0.62
ingu
-0.61
contiguous
-0.61
Hansen
-0.60
anks
-0.59
Applications
-0.59
keys
-0.59
casts
-0.59
POSITIVE LOGITS
lady
3.68
ladies
2.20
Lady
2.15
Lady
2.11
woman
1.88
girl
1.71
gentleman
1.70
Woman
1.64
woman
1.61
Ladies
1.50
Activations Density 0.013%