INDEX
Explanations
phrases with words related to uncertainty and speculation, along with technical terms and statistical concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.12
0.3%
1652
+0.10
0.3%
45
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1652
+0.12
0.05
45
+0.10
0.03
1580
+0.08
0.03
Negative Logits
gaily
-0.78
Daven
-0.72
Confu
-0.71
Vaugh
-0.70
Middles
-0.68
brilli
-0.67
ecru
-0.66
Keny
-0.66
oreo
-0.65
disagre
-0.64
POSITIVE LOGITS
guess
0.78
guesses
0.75
speculation
0.74
guessing
0.74
speculate
0.64
conjecture
0.64
assumptions
0.62
guessed
0.59
speculative
0.59
guess
0.58
Activations Density 0.448%