INDEX
    Explanations

    occurrences of the "<bos>" token, indicating the beginning of segments in the text

    New Auto-Interp
    Negative Logits
     himſelf
    -0.86
     myſelf
    -0.81
    WriteTagHelper
    -0.80
    Espèce
    -0.79
    ſelves
    -0.78
     Jefus
    -0.78
     themſelves
    -0.78
     Efq
    -0.76
     Theſe
    -0.76
     Shakspeare
    -0.76
    POSITIVE LOGITS
    <eos>
    1.02
     ***!
    0.57
    ()));
    
    0.56
     مرئيه
    0.55
    .*")]
    0.54
    )))
    
    0.54
    0.52
     fine
    0.51
    ())).
    0.49
    .
    0.48
    Act Density 0.153%

    No Known Activations