INDEX
Explanations
references to noble individuals and titles
New Auto-Interp
Negative Logits
wn
-0.66
Francine
-0.66
>);
-0.64
trock
-0.63
dvo
-0.63
bnf
-0.60
n
-0.60
Martinez
-0.60
Healy
-0.60
Staates
-0.59
POSITIVE LOGITS
Noble
1.44
Noble
1.43
Nobel
1.28
Nobles
1.27
nobles
1.26
noble
1.22
Nobel
1.21
nobility
1.18
noble
1.10
nobler
1.07
Activations Density 0.004%