INDEX
    Explanations

    questions or statements of reasoning related to topics of moral or ethical consideration

    "because" or similar causal words

    New Auto-Interp
    Negative Logits
     Comprometido
    -0.64
    contentLoaded
    -0.63
    IntoConstraints
    -0.62
    featureID
    -0.61
    errHandler
    -0.56
     deſſen
    -0.55
     artesanales
    -0.55
    aarrggbb
    -0.55
    :✨
    -0.54
     geweſen
    -0.53
    POSITIVE LOGITS
     because
    0.88
    because
    0.74
     Because
    0.70
    Because
    0.69
    是因为
    0.65
     porque
    0.63
     reasons
    0.61
     simply
    0.59
     때문
    0.57
     BECAUSE
    0.56
    Act Density 0.324%

    No Known Activations