INDEX
    Explanations

    sections of text that contain the character "-" followed by a non-zero activation

    New Auto-Interp
    Negative Logits
    sizeCache
    -1.00
     démocr
    -0.92
    $")
    -0.86
    )");
    
    -0.86
     ModelExpression
    -0.84
     Efq
    -0.84
     Савезне
    -0.83
     étoit
    -0.83
     étoient
    -0.83
     $_"
    -0.83
    POSITIVE LOGITS
    -
    0.64
    (
    0.59
    *
    0.52
        
    0.52
    0.51
    0.49
      
    0.49
    _
    0.48
    GenerationType
    0.47
    <eos>
    0.46
    Act Density 0.216%

    No Known Activations