INDEX
Explanations
phrases related to outcomes or consequences
phrases that indicate cause and effect outcomes
New Auto-Interp
Negative Logits
spaced
-0.75
bones
-0.74
craw
-0.68
ut
-0.65
periphery
-0.62
Straw
-0.61
current
-0.61
mens
-0.59
thur
-0.59
tera
-0.59
POSITIVE LOGITS
UE
0.84
antly
0.84
uced
0.81
Enh
0.79
interstitial
0.73
uments
0.72
aternity
0.71
uces
0.70
ãĤ¯
0.70
aneously
0.69
Activations Density 0.027%