INDEX
Explanations
questions or statements of reasoning related to topics of moral or ethical consideration
"because" or similar causal words
New Auto-Interp
Negative Logits
Comprometido
-0.64
contentLoaded
-0.63
IntoConstraints
-0.62
featureID
-0.61
errHandler
-0.56
deſſen
-0.55
artesanales
-0.55
aarrggbb
-0.55
:✨
-0.54
geweſen
-0.53
POSITIVE LOGITS
because
0.88
because
0.74
Because
0.70
Because
0.69
是因为
0.65
porque
0.63
reasons
0.61
simply
0.59
때문
0.57
BECAUSE
0.56
Activations Density 0.324%