INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    i
    -0.75
    when
    -0.69
    When
    -0.65
    o
    -0.63
    s
    -0.60
    cuando
    -0.60
    e
    -0.60
    ed
    -0.56
    hesis
    -0.56
    quando
    -0.55
    POSITIVE LOGITS
     you
    0.93
     we
    0.89
     the
    0.82
     it
    0.80
     someone
    0.72
     there
    0.71
     I
    0.66
     Do
    0.66
     discussing
    0.65
     a
    0.65
    Act Density 0.066%

    No Known Activations