INDEX
    Explanations

    mentions of Charles Darwin and related terms or variations of his name

    New Auto-Interp
    Negative Logits
    tte
    -0.17
    ander
    -0.16
    ih
    -0.16
    gers
    -0.15
    ity
    -0.15
    ited
    -0.15
    yr
    -0.15
    ño
    -0.15
    ihan
    -0.14
    embr
    -0.14
    POSITIVE LOGITS
    lington
    0.23
    lene
    0.22
    win
    0.22
    lings
    0.21
    ÃŃo
    0.21
    fur
    0.21
    Dar
    0.20
    shan
    0.20
    wish
    0.20
    erca
    0.20
    Act Density 0.013%

    No Known Activations