INDEX
    Explanations

    punctuation marks, particularly commas

    New Auto-Interp
    Negative Logits
    AddTagHelper
    -0.96
     zwiſchen
    -0.90
    <unused43>
    -0.90
    <unused28>
    -0.89
    <unused23>
    -0.89
    <unused8>
    -0.89
    <unused14>
    -0.89
    <unused51>
    -0.89
    [@BOS@]
    -0.89
    <pad>
    -0.89
    POSITIVE LOGITS
    ,
    0.42
    0.33
      
    0.33
     cool
    0.32
     big
    0.32
    0.32
    :
    0.31
    old
    0.31
    "
    0.31
     
    0.30
    Act Density 0.023%

    No Known Activations