INDEX
    Explanations

    sections of the text where there is a significant activation, indicating the beginning of a new segment or major topic shift

    New Auto-Interp
    Negative Logits
    ніципалі
    -0.60
    orap
    -0.60
    dav
    -0.57
    іга
    -0.57
     Chio
    -0.56
    ening
    -0.56
    idhi
    -0.56
     Merid
    -0.55
    mistak
    -0.55
    úly
    -0.54
    POSITIVE LOGITS
    1.80
    ↵↵
    1.51
    </h4>
    1.18
    '])){
    
    1.12
    ')));
    1.09
    )){
    
    1.08
    </h3>
    1.08
    "]];
    1.06
    "])){
    1.06
    "]);
    
    1.06
    Act Density 0.110%

    No Known Activations