INDEX
    Explanations

    based on financial or technical details

    tokens that occur in the assistant's long, explanatory reply text — especially opening/discourse tokens (like "Okay,") and other words in extended model-generated responses.

    New Auto-Interp
    Negative Logits
    myButtons
    0.52
    ड़ने
    0.52
    ಂಟು
    0.49
     придется
    0.49
     bisog
    0.49
    кость
    0.48
     quaisquer
    0.48
    אים
    0.47
     Polaribacter
    0.47
    ान्य
    0.47
    POSITIVE LOGITS
    github
    0.53
     (
    0.51
    Cl
    0.49
    ref
    0.47
    seed
    0.47
    static
    0.46
    water
    0.46
    line
    0.46
    ll
    0.45
    style
    0.45
    Act Density 0.001%

    No Known Activations