INDEX
    Explanations

    punctuation marks and sentence boundaries

    numerical quantities and units

    New Auto-Interp
    Negative Logits
     briefly
    -0.34
     Volkes
    -0.31
     neither
    -0.31
    y
    -0.31
     temporarily
    -0.31
      
    -0.30
     blindly
    -0.30
     both
    -0.29
    -0.29
    fully
    -0.28
    POSITIVE LOGITS
     ſind
    0.79
    linawan
    0.79
    encodeWith
    0.77
    IntoConstraints
    0.77
    <unused8>
    0.76
    [@BOS@]
    0.76
    <unused16>
    0.76
    <unused74>
    0.76
    <unused51>
    0.76
    <unused14>
    0.75
    Act Density 0.117%

    No Known Activations