INDEX
Explanations
instances of effort, research, and planning in various contexts
New Auto-Interp
Negative Logits
ollider
-0.16
geç
-0.15
taÅŁ
-0.13
ichert
-0.13
enant
-0.13
èĥ½åĬĽ
-0.13
.Binding
-0.12
andler
-0.12
ensburg
-0.12
Ending
-0.12
POSITIVE LOGITS
research
0.33
investigation
0.28
deliber
0.27
contempl
0.27
strateg
0.27
thought
0.26
analysis
0.26
thinking
0.26
detective
0.25
experimentation
0.25
Activations Density 0.342%