INDEX
    Explanations

    the word "couldn't" with high activation values

    instances of the word "couldn't."

    New Auto-Interp
    Negative Logits
    oak
    -0.69
    dress
    -0.67
     protected
    -0.65
     liberated
    -0.62
    backer
    -0.62
    croft
    -0.61
    ULT
    -0.60
     PU
    -0.60
     deposition
    -0.58
    otype
    -0.58
    POSITIVE LOGITS
    't
    1.51
    adian
    0.96
     afford
    0.91
    atio
    0.90
    kered
    0.90
    ayan
    0.85
    anke
    0.84
     feas
    0.84
    ilater
    0.80
    ieve
    0.79
    Act Density 0.012%

    No Known Activations