INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    when
    -0.85
     when
    -0.79
     quando
    -0.79
     cuando
    -0.79
     lorsqu
    -0.75
     avoient
    -0.74
    cuando
    -0.74
     fidé
    -0.71
     étoient
    -0.70
     lorsque
    -0.70
    POSITIVE LOGITS
     the
    0.94
     a
    0.82
     an
    0.72
     it
    0.70
     using
    0.69
     compared
    0.68
     designing
    0.67
     “
    0.66
     making
    0.66
     there
    0.65
    Act Density 0.051%

    No Known Activations