INDEX
    Explanations

    instances of the start of textual segments or paragraphs

    New Auto-Interp
    Negative Logits
    .
    -0.70
    >();
    -0.69
    ');
    
    -0.67
    >();
    
    -0.65
    );
    
    -0.65
     be
    -0.63
    ;
    
    -0.62
     also
    -0.61
     …
    -0.58
    ");
    
    -0.58
    POSITIVE LOGITS
     Jefus
    0.96
     Shakspeare
    0.92
     himſelf
    0.92
     itſelf
    0.86
     myſelf
    0.84
     Monfieur
    0.83
    <bos>
    0.81
     theſe
    0.78
     Theſe
    0.78
     Eſ
    0.76
    Act Density 0.864%

    No Known Activations