INDEX
Explanations
phrases related to critical thinking and evidence-seeking behavior
New Auto-Interp
Negative Logits
173
-0.14
.addTo
-0.14
indr
-0.14
vana
-0.13
essler
-0.13
omens
-0.13
andas
-0.12
.Ticks
-0.12
essen
-0.12
.nil
-0.12
POSITIVE LOGITS
unconventional
0.45
innovative
0.39
unusual
0.38
novel
0.36
innovate
0.35
bold
0.35
unique
0.35
innovation
0.34
nov
0.32
daring
0.32
Activations Density 0.067%