INDEX
    Explanations

    mentions of the word "Stone" with varying activations

    instances of the word "Stone."

    New Auto-Interp
    Negative Logits
    oresc
    -0.79
    merce
    -0.77
    olulu
    -0.76
    ornia
    -0.75
    orescence
    -0.74
    ntil
    -0.72
    unal
    -0.70
    unct
    -0.69
    ulate
    -0.68
    ership
    -0.67
    POSITIVE LOGITS
    falls
    0.91
    hill
    0.88
    works
    0.83
    lings
    0.83
     Cold
    0.83
    ring
    0.81
    hook
    0.81
    asure
    0.81
    asures
    0.80
     Age
    0.79
    Act Density 0.019%

    No Known Activations