INDEX
    Explanations

    formal language negations

    instances of the word "didn't" or its variations in different contexts

    New Auto-Interp
    Negative Logits
     attain
    -0.62
    alg
    -0.62
    ipes
    -0.60
    cot
    -0.53
     Article
    -0.52
     approximately
    -0.52
     PUBLIC
    -0.52
     reciprocal
    -0.51
     ens
    -0.51
     instances
    -0.50
    POSITIVE LOGITS
     didn
    2.94
     hadn
    2.53
    didn
    2.51
     wasn
    2.38
     weren
    2.32
     couldn
    2.29
     didnt
    2.29
     Didn
    2.21
     wouldn
    2.10
     hasn
    1.98
    Act Density 0.033%

    No Known Activations