INDEX
    Explanations

    conjunctions and phrases indicating addition or connection

    New Auto-Interp
    Negative Logits
    .
    -0.84
    -0.66
    ].
    -0.56
    -0.56
    ..
    -0.55
     .
    -0.55
    ).
    -0.54
    ".
    -0.54
    <h2>
    -0.52
    ;
    -0.49
    POSITIVE LOGITS
    $")
    0.92
    ',
    
    
    0.90
    +};
    0.90
    ='';
    
    0.90
     betweenstory
    0.88
    "},
    
    0.88
    ніципалі
    0.87
    WriteBarrier
    0.86
     كومونز
    0.85
    ")==
    0.84
    Act Density 0.503%

    No Known Activations