INDEX
    Explanations

    references to numerical values, specifically in the context of statistics or measurements

    New Auto-Interp
    Negative Logits
     myſelf
    -1.02
     Anſ
    -0.91
     Efq
    -0.91
     uſed
    -0.86
     themſelves
    -0.84
     ſy
    -0.84
    ſelf
    -0.82
     purpoſe
    -0.82
     reaſon
    -0.80
     Monfieur
    -0.80
    POSITIVE LOGITS
     ${
    0.48
     fixed
    0.47
    <eos>
    0.46
    elsa
    0.45
    </b>
    0.45
    нец
    0.45
     oliva
    0.44
     S
    0.44
    0.43
    </strong>
    0.42
    Act Density 0.020%

    No Known Activations