INDEX
    Explanations

    comparisons

    New Auto-Interp
    Negative Logits
    your
    -0.82
     your
    -0.79
    Your
    -0.69
    you
    -0.67
    Votre
    -0.65
     you
    -0.65
    YOUR
    -0.63
     Your
    -0.61
     votre
    -0.54
     youre
    -0.54
    POSITIVE LOGITS
     were
    0.78
     was
    0.74
     did
    0.71
    therners
    0.61
     Europeans
    0.61
     WERE
    0.60
     physicists
    0.59
     astronomers
    0.58
    shadowColor
    0.57
    arians
    0.57
    Act Density 0.001%

    No Known Activations