INDEX
    Explanations

    instances of the word "find" across various contexts

    New Auto-Interp
    Negative Logits
    assisted
    -0.75
    ansky
    -0.73
    agn
    -0.71
    stroke
    -0.67
    inion
    -0.67
    forming
    -0.65
    jab
    -0.65
    awar
    -0.64
    haw
    -0.63
    idium
    -0.62
    POSITIVE LOGITS
     plenty
    1.06
     yourself
    0.87
     yourselves
    0.86
     lots
    0.82
     ample
    0.81
     references
    0.79
     traces
    0.78
     them
    0.77
     fewer
    0.75
     ourselves
    0.74
    Act Density 0.029%

    No Known Activations