INDEX
Explanations
proper nouns referring to individuals
instances and references to the term "born"
New Auto-Interp
Negative Logits
NRS
-0.64
degraded
-0.64
oter
-0.64
strip
-0.63
ioxide
-0.60
aito
-0.60
arty
-0.60
Adv
-0.59
battered
-0.59
olicy
-0.58
POSITIVE LOGITS
born
1.39
lings
1.02
ness
1.00
nesses
0.99
stellar
0.94
Born
0.90
stein
0.86
furt
0.82
lisher
0.81
tons
0.80
Activations Density 0.008%