INDEX
Explanations
phrases indicating cause and effect
connecting words that indicate causation or consequence
New Auto-Interp
Negative Logits
quit
-0.77
RAW
-0.68
ctions
-0.65
Swim
-0.65
wait
-0.64
wl
-0.64
rete
-0.63
cit
-0.63
boarded
-0.63
Submit
-0.63
POSITIVE LOGITS
preventing
1.66
reducing
1.66
enabling
1.59
facilitating
1.55
allowing
1.55
enhancing
1.54
ensuring
1.53
eliminating
1.50
boosting
1.49
preserving
1.48
Activations Density 0.255%