INDEX
Explanations
phrases indicating a situation or context of uncertainty or speculation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.26
1.4%
897
+0.12
0.6%
1678
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
897
+0.26
0.02
1604
+0.12
0.02
1035
+0.11
0.02
Negative Logits
<bos>
-3.12
get
-0.73
<?
-0.72
put
-0.72
operate
-0.68
got
-0.67
go
-0.67
connect
-0.67
protected
-0.67
look
-0.64
POSITIVE LOGITS
lidl
1.72
wien
1.69
tew
1.66
affor
1.66
squa
1.65
milf
1.65
ftu
1.65
desir
1.63
stockholm
1.62
fte
1.59
Activations Density 0.043%