INDEX
    Explanations

    phrases introducing reasoning or justification

    New Auto-Interp
    Negative Logits
    makeText
    -0.70
    ()")
    -0.65
    )")
    -0.63
    `;
    
    -0.63
    encodeWith
    -0.62
    cotch
    -0.61
    \",\
    -0.61
    `,
    
    -0.60
    "',
    -0.60
    sproz
    -0.59
    POSITIVE LOGITS
     reasons
    0.94
     purposes
    0.80
    reasons
    0.71
     REASONS
    0.69
     sake
    0.68
     instance
    0.67
     Purposes
    0.64
     Reasons
    0.63
     reason
    0.62
    Reasons
    0.61
    Act Density 0.418%

    No Known Activations