INDEX
    Explanations

    math problems, equations

    structural and discourse cues of the model’s step-by-step math solution (e.g., response headers, newlines/section breaks, and procedural lead-ins indicating the start of an explanation).

    New Auto-Interp
    Negative Logits
    es
    0.50
    hes
    0.46
    s
    0.46
    iever
    0.46
    Therefore
    0.46
    rain
    0.46
    ations
    0.43
    al
    0.43
    herd
    0.43
    en
    0.42
    POSITIVE LOGITS
     учиты
    0.45
    Inspect
    0.45
     සෑම
    0.45
    ष्
    0.45
    0.44
     gewoon
    0.44
     Inspect
    0.43
    ט
    0.43
     آبی
    0.43
    0.43
    Act Density 0.051%

    No Known Activations