我正在重构一个将进行大量计算的分析系统,我需要一些关于可能的架构设计的想法,以解决我面临的数据一致性问题.
I am refactoring an Analytic system that will do a lot of calculation, and I need some ideas on possible architectural designs to a data consistency issue I am facing.
当前架构
我有一个基于队列的系统,其中不同的请求应用程序创建最终由工作人员使用的消息.
I have a queue based system, in which different requesting applications create messages that are eventually consumed by workers.
每个请求应用"将大型计算分解成较小的部分,这些部分将被发送到队列并由工作人员处理.
Each "Requesting App" breaks down a large calculation into smaller pieces that will be sent to the queue and processed by the workers.
当所有部分都完成后,原始请求应用"将合并结果.
When all the pieces are finished, the originating "Requesting app" will consolidate the results.
此外,workers 使用来自中央数据库 (SQL Server) 的信息来处理请求(重要:worker 不会更改数据库上的任何数据,只会使用它).
Also, the workers consume information from a centralized database (SQL Server) in order to process the requests (Important: the workers do not change any data on the database, only consume it).
问题
好的.到现在为止还挺好.当我们包含更新数据库信息的 Web 服务时,就会出现问题.这可能随时发生,但至关重要的是,源自同一个请求应用程序"的每个大型计算"都会在数据库中看到相同的数据.
Ok. So far, so good. The problem arises when we include a web service that updates the information on the database. This can happen at any time, but it is critical that each "large calculation" originated from the same "Requesting App" sees the same data on the database.
例如:
我不能让工作人员 W2 使用数据库的状态 S1.为了使整个计算保持一致,应该使用之前的 S0 状态.
I just can´t have worker W2 using state S1 of the database. for the whole calculation to be consistent it should use the previous S0 state.
想法
锁定模式,以防止 Web 服务器在有工作人员从数据库中使用信息时更改数据库.
A lock pattern to prevent the web server from changing the database while there is a worker consuming information from it.
在数据库和工作程序之间创建新层(通过请求应用程序控制数据库缓存的服务器)
Create new layer between the database and the workers (a server that controls db caching by req. app)
我正在等待第二种解决方案,但对它不是很有信心.
I am pending to the second solution, but not very confident about it.
有什么绝妙的主意吗?我设计错了,还是遗漏了什么?
Any brilliant ideas ? Am I designing it wrong, or missing something ?
OBS:
感谢大家的帮助.
因为我认为这个问题在其他场景中可能很常见,所以我想分享我们选择的解决方案.
Since I believe this is problem might be usual in other scenarios, I would like to share the solution we chose.
更彻底地思考这个问题,我明白了它的真正含义.
Thinking more thoroughly about the problem, I understood it for what it really is.
现在计算已经进化为分布式,我只需要将我的缓存也进化为分布式.
Now the calculation has evolved to be distributed, I just needed to evolve my cache to be distributed as well.
为了做到这一点,我们选择使用内存数据库(哈希值),部署为单独的服务器.(在本例中为 Redis).
In order to do that, we chose to use an In-Memory Database (hash-value), deployed as a separate server. (in this case Redis).
现在每次开始工作时,我都会为工作创建一个 ID 并将其传递给他们的消息
Now every time I start a job, I create a ID for the job and pass it to their messages
当每个工人想从数据库中获取一些信息时,它会:
When each worker wants some information from the database, it would:
在作业结束时,我清除与作业 ID 关联的所有哈希值.
At the end of the job, I clear all hashes associated with the job ID.
这篇关于分布式分析系统数据一致性架构设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!