INDEX
Explanations
phrases indicating reasons or motivations for actions
phrases that specify causal relationships or conditions, particularly focusing on the word "because."
New Auto-Interp
Negative Logits
ength
-0.67
)].
-0.66
mun
-0.65
Flavoring
-0.62
Il
-0.61
Strongh
-0.60
©¶æ
-0.60
Home
-0.59
imeters
-0.59
creen
-0.59
POSITIVE LOGITS
mention
0.79
slightest
0.75
versa
0.69
nor
0.67
anything
0.66
necessarily
0.65
icable
0.65
anywhere
0.64
ivable
0.63
anymore
0.63
Activations Density 0.138%