INDEX
    Explanations

    phrases or concepts related to curiosity or questions

    structural tags and markup

    New Auto-Interp
    Negative Logits
    AndEndTag
    -1.15
     myſelf
    -1.14
    ロウィン
    -1.03
    [@BOS@]
    -1.02
    <unused41>
    -1.02
    <unused16>
    -1.02
    <unused28>
    -1.02
    <unused23>
    -1.02
    <unused8>
    -1.02
    <unused14>
    -1.02
    POSITIVE LOGITS
    ↵↵
    0.52
    1
    0.47
      
    0.44
    0.41
        
    0.40
    2
    0.39
     +
    0.39
    <h1>
    0.38
    .
    0.38
    0.38
    Act Density 0.069%

    No Known Activations