INDEX
Explanations
proper nouns
references to female characters or figures in the text
New Auto-Interp
Negative Logits
king
-0.68
revolt
-0.63
quality
-0.62
ethic
-0.61
fut
-0.59
gloss
-0.59
Rebellion
-0.59
liv
-0.57
Revolution
-0.57
league
-0.56
POSITIVE LOGITS
Ms
4.43
Ms
2.32
Mrs
1.96
Mr
1.91
Ps
1.44
Mrs
1.40
Ns
1.38
Mr
1.28
Vs
1.27
MR
1.26
Activations Density 0.004%