INDEX
Explanations
phrases indicating potential outcomes or consequences of actions
phrases indicating outcomes or consequences
New Auto-Interp
Negative Logits
spaced
-0.72
craw
-0.71
bones
-0.70
periphery
-0.67
ut
-0.65
Straw
-0.64
floors
-0.64
Secrets
-0.63
pitch
-0.63
vigil
-0.61
POSITIVE LOGITS
Enh
0.83
interstitial
0.80
UE
0.79
swers
0.79
uments
0.78
uced
0.78
antly
0.77
ivity
0.71
uces
0.71
enance
0.70
Activations Density 0.029%