We developed a complex reactive system to migrate big data for one of our EdTech projects. To build the solution, we applied actor model with Akka.NET, and we wanted to share our experience. The article could be interesting to owners of software products built using outdated/legacy systems who consider migrating the data into a newer systems, software engineers, and DevOps.
For one of our recent EdTech projects, we had to solve quite a challenging task of migrating a large bulk of data from the client’s legacy system. In a nutshell, the project was a featured K-12 learning management system (LMS) bringing US school websites, content management and group collaboration tools together. Over 12 years of the legacy system’s life, it accumulated thousands of users, each possessing files, content and other data. Totaled up, the system stored a few tens of millions of different objects (lesson plans, student home works, tests, attachments to posts and what not). We were tasked with carefully migrating all live users to an entirely new solution while keeping their data and the original functionality replicated.
Challenges of the Task
Building the migration system was a not-so-trivial task. The main challenges were:
- Limited time-frame for migration. Of course, it was to be done ASAP. Both new and old systems were in use while the migration, and the goal was to switch all users over to the new one and eventually freeze the legacy system.
- The amount of data was relatively big. The old system was almost 12 years old, so it contained >100K users and >20M of user resources.
- The user data was inconsistent and scattered due to long evolution of the legacy system. Some files were stored in databases and file systems, some were generated on the fly.
- The way data was structured differed between the old and new systems. Both systems had varied data-processing operations and mechanisms. For instance, in the new system, the data could be grouped per specific class or school year, which was impossible in the old system.
- We should have had minimum effect on the productivity of the old system. While migration, the throughout capacity was to be adjusted depending on its load so end-users never noticed it is under way.
- We had to keep the future product roadmap in mind, including the features the users would need in the new system.
Our data migration system was to address all of these challenges and be as effective and fault-tolerant as possible.
Solution We Built with Akka.NET
The tight schedule and large volumes of data involved in migrating meant that parallel execution was paramount. So, to build the migration system, we decided to use a toolkit for highly concurrent, distributed, and fault-tolerant applications called Akka.NET, based on .NET. We used this framework to build the solution with the so-called actor model; it allowed us to migrate big data efficiently and seamlessly.
The actor model was invented by Carl Hewitt in the early 1970’s as a model particularly suited to parallel and AI computing applications. Hewitt introduced actors as an alternative approach to managing parallel computations and distributed systems. The model became widespread by Erlang, a programming language and runtime environment with a built-in support for concurrency, distribution and fault-tolerance used by large telecommunication systems.
Within actor model context, the system is defined by the set of actors and the tasks they perform. Actors communicate by sending messages to one another. An actor receives messages in an asynchronous manner but processes them one by one, deciding what to do with a message based on its content and type. While one message is being processed, the rest of the unprocessed messages are stored in a “mailbox.” So, as a self-contained active entity, an actor a) specifies a new behavior for itself; b) can create other actors and c) sends a message to one or more actors. All actors operate concurrently, and all independent operations within an actor’s behavior are also concurrent. Non-blocking communication allows actors to only consume resources while active, leading to less system overhead.
The application could be viewed as an array of isolated actors and supervisors-actors. When any of the supervising actors fails, the supervisor could recover it by represtination. Thus, the applications built with actor model are self-restoring. We used this when building our migration system with multiple retry strategies and maximum fault-tolerance.
The actor model alleviates the developer from having to deal with explicit locking and thread management. The dynamic nature of the actor model allows for adaptation to the system’s current requirements and load. So, our migration system was reactive to the load of the old system, which we were not to overload. This means that migration was continuously running on the background and gained throughput at minimum load of the old system, i.e. weekends and early morning and late evening weekdays.
Using the actor model, we built a cool concurrent and secure migration system and carefully migrated all structured user data and files - overall, approximately 3 terabyte of information - from the old system into the new one. This took us almost 1.5 month.
Our migration system bore all four essential characteristics of a reactive system. It was
- responsive: focused on rapid and consistent response times;
- resilient: remained responsive to the failure of each isolated component. Recovery of each component was delegated to another, external one, thus the system as a whole was never compromised;
- elastic: stayed responsive under varying workloads and increased/decreased its capacity accordingly;
- message driven: applied asynchronous message-passing to connect components.
During migration, we inevitably faced some errors, so - for easier navigation in the loooong log file - we wrote a script to find the files that failed to migrate. When such a need appeared, we could find the file in question in the old system and copy in into the new one by hand.
With actor model, we built the migration system with further roadmap in mind. We knew the users would require a set of features in the new system, such as batch processing, email mail-outs, integration with School Information Systems, scheduled job execution among others. Using this model, we added these pieces of functionality into the new system.
- Logicify Best Practices for Knowledge Transfer Between Developers
- Logicify Debut in EdTech: How We Built an Online Platform for Inclusive Education