INDEX
Explanations
phrases denoting rules or guidelines
conjunctions and practice a focus on defining and clarifying standards or expectations
New Auto-Interp
Negative Logits
rocket
-0.60
staking
-0.59
fec
-0.58
cision
-0.57
Pyramid
-0.56
ginger
-0.55
Dian
-0.54
assassination
-0.54
Adult
-0.53
enne
-0.53
POSITIVE LOGITS
how
1.07
what
1.03
why
0.96
whats
0.95
what
0.94
why
0.92
shouldn
0.89
whence
0.86
where
0.78
WHY
0.78
Activations Density 0.075%