INDEX
Explanations
phrases related to reasons or explanations
causal explanations or justifications in statements
New Auto-Interp
Negative Logits
achable
-0.79
ipers
-0.71
Deal
-0.64
iveness
-0.63
mbol
-0.62
venge
-0.62
scribe
-0.62
rouse
-0.61
urity
-0.61
holy
-0.60
POSITIVE LOGITS
although
0.94
"[
0.82
despite
0.79
while
0.79
unlike
0.78
respondents
0.77
neither
0.76
whilst
0.75
ecause
0.75
Lack
0.74
Activations Density 0.418%