INDEX
Explanations
causal relationships or explanations
"Because" at the beginning of a sentence
because introducing explanation
New Auto-Interp
Negative Logits
ſelf
-0.94
ſeveral
-0.78
Majefty
-0.77
Efq
-0.77
ValueStyle
-0.76
himſelf
-0.76
ſelves
-0.75
myſelf
-0.74
ſta
-0.73
Houſe
-0.73
POSITIVE LOGITS
they
1.12
it
1.01
we
0.95
Because
0.85
there
0.84
Because
0.81
ECAUSE
0.80
unlike
0.78
of
0.77
unlike
0.76
Activations Density 0.095%