INDEX
    Explanations

    sections related to study methodology and experimental design

    Section titles: "Background", "Methods", or "Results"

    New Auto-Interp
    Negative Logits
    <unused14>
    -1.04
    <unused16>
    -1.03
    <unused41>
    -1.03
    <unused79>
    -1.03
    <unused28>
    -1.03
    <unused3>
    -1.03
    <unused51>
    -1.03
    <unused52>
    -1.03
    <pad>
    -1.02
    [@BOS@]
    -1.02
    POSITIVE LOGITS
    :
    0.47
    ↵↵
    0.39
    </strong>
    0.35
    ;
    0.34
    0.34
    ładka
    0.31
    ?
    0.31
    </h3>
    0.30
     Features
    0.30
    </em>
    0.29
    Act Density 0.290%

    No Known Activations