INDEX
Explanations
terms related to the inevitability of consequences and the importance of acknowledging foundational truths
New Auto-Interp
Negative Logits
reasons
-0.18
powers
-0.17
flags
-0.17
answers
-0.17
answers
-0.17
yles
-0.17
arguments
-0.16
goals
-0.16
ways
-0.16
flags
-0.16
POSITIVE LOGITS
feature
0.43
feature
0.35
attribute
0.33
phenomenon
0.32
trait
0.30
Feature
0.30
aspect
0.29
Feature
0.29
requirement
0.28
-feature
0.27
Activations Density 0.498%