INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     was
    -0.10
     were
    -0.09
     could
    -0.09
     would
    -0.08
     WAS
    -0.07
    could
    -0.07
     had
    -0.07
     hadn
    -0.07
     Was
    -0.07
     isn
    -0.07
    POSITIVE LOGITS
    acomment
    0.07
    /scripts
    0.07
     strt
    0.07
    нг
    0.07
    .di
    0.06
    writing
    0.06
    ано
    0.06
    .det
    0.06
     avoided
    0.06
     temperament
    0.06
    Act Density 0.856%

    No Known Activations