الاثنين، 15 مايو 2023
Show HN: Capillaries: Distributed data processing with Go and Cassandra https://ift.tt/3ehMfZL
Show HN: Capillaries: Distributed data processing with Go and Cassandra I started thinking about this approach after working on a large-scale project for a major financial company where our group developed a distributed in-house data processing solution. On a regular basis, it ingested a few gigabytes of financial data and, within a tight SLA time limit, produced a lot of enriched/aggregated/validated data for a number of customers. Sometimes, source data had errors, so operators with domain knowledge had to verify data validity at some checkpoints, immediately make corrections, and re-run parts of the workflow manually. The solution involved complex web service orchestration, custom database and was very demanding on the infrastructure availability. Capillaries is a built from scratch, open-source Go solution that does just that: ingests data and applies user-defined transforms - Go one-liner expressions, Python formulas, joins, aggregations, denormalization - using Cassandra for intermediate data storage and RabbitMQ for task scheduling. End users just have to provide: - source data in CSV files; - Capillaries script (JSON file) that defines the workflow and the transforms; - Python code that performs complex calculations (only if needed). The whole data processing pipeline can be split into separate runs that can be started independently and re-run by the user if needed. The goal is to build a platform that is tolerant to database and processing node failures, and allows users to focus on data transform logic and data quality control. “Getting started” Docker-based demo calculates ARK funds performance, using EOD holdings and transactions data acquired from public sources. There are also integration tests that use non-financial data. There is a test deploy tool that uses Openstack API for provisioning in the cloud. https://capillaries.io May 16, 2023 at 03:13AM
الاشتراك في:
تعليقات الرسالة (Atom)
������ �����
خدمات طبيه https://www.cut-titles.com/Y4ZR
-
https://ift.tt/2XaTMCq via /r/aww https://ift.tt/2ZIIOW9
-
Show HN: Recursive Wikipedia This extension loads the mobile version of links when hovered over, is usually faster than opening a new tab, a...
-
What is menopause Menopause is when a woman’s menstrual periods permanently end. It happens because, as a woman ...
-
https://ift.tt/3ajpa7P via /r/aww https://ift.tt/32oiot3
-
Show HN: ssm-tool – simplifying SSH access over AWS SSM https://ift.tt/3cn0Elg May 11, 2020 at 05:33AM
-
Show HN: Format-code – a simple and convenient online formater based on prettier https://ift.tt/30PdkRl November 21, 2021 at 12:02AM
-
https://ift.tt/3jycBrg via /r/aww https://ift.tt/2QUQMpT
ليست هناك تعليقات:
إرسال تعليق