INDEX
    Explanations

    lines starting with a specific token or header, indicating sections or topics in a document

    New Auto-Interp
    Negative Logits
     Huntingdon
    -0.77
     Danilo
    -0.75
     Jarrett
    -0.70
     Gaby
    -0.70
     CommonModule
    -0.66
    ('{
    -0.64
    [](
    -0.63
    -0.63
     Lafayette
    -0.63
     Schwar
    -0.63
    POSITIVE LOGITS
    principalTable
    0.93
    TagMode
    0.87
     cuisson
    0.82
    ')}}"
    0.82
    Gru
    0.81
    afone
    0.81
     Cron
    0.80
     Rug
    0.79
     Gru
    0.79
    )")
    0.78
    Act Density 0.069%

    No Known Activations