INDEX
Explanations
mentions of Charles Darwin and related terms or variations of his name
New Auto-Interp
Negative Logits
tte
-0.17
ander
-0.16
ih
-0.16
gers
-0.15
ity
-0.15
ited
-0.15
yr
-0.15
ño
-0.15
ihan
-0.14
embr
-0.14
POSITIVE LOGITS
lington
0.23
lene
0.22
win
0.22
lings
0.21
ÃŃo
0.21
fur
0.21
Dar
0.20
shan
0.20
wish
0.20
erca
0.20
Activations Density 0.013%