Thursday, May 8, 2014

Speeding Up Slow Processes

The Problem
I recently upgraded portions of a consumer portal identity management system. Part of this upgrade included the caching of charge records from the billing system which runs on an IBMi system. The API into this mainframe system is slow and what's worse is the mainframe interactive mode is turned off during the nightly processing which includes processing service orders, running reports, and finally execute any billing processes (the environment is turned over to batch mode).  This essentially means that the API is down until the interactive mode is active.  The latter problem is primarily why the charge records of the billing system needed to be cached in a database that would be available 24x7 for the identity manager and related functions on the consumer portal. These billing elements are used to control SAML assertions that are sent to other relying parties (or service providers). These also control and define behaviors on a page that is used to display what applications are available to the customer (e.g. phone manager, email manager, etc.). Links are also displayed on this page for a number of 3rd party applications, that are not SAML compliant and require a place where identity information can be shared using various custom security schemes.

This is the second customer portal identity management system that I've worked on for this company. In the first system when large changes like this are made or if there have been significant problems with the interface that provides updates on the customer information (think CRUD operations) an application called "AccountSync" is executed. The idea is that AccountSync will update the identity management system, and the related data that is cached, with current data from the billing system. Since this system is only eight months old there had not been a need to run the AccountSync against the IBMi billing system. There were still a few weeks before this information would be needed - but early deployment of portions that would populate the cached data seemed a reasonable way of making sure all was in order when the final changes were deployed to the live system. So the database was upgraded and the data contracts for the web services were updated with the new structures and everything was working great - data was being populated as the system was notified of service orders being completed in the billing system. In addition to these changes a small change to the AccountSync application was required as the mainframe would be placed in batch mode during the nightly processing between 1900 and 0400. Once those changes were made the application was run.

And run it did...for a solid week. There are only ~40,000 customers! A solid week! The first system has ~66,000 customers and AccountSync would take only ~2.5 days. Clearly something must be done to increase the efficiency of this process especially when the long term goal is to migrate all customers into one consumer portal.

The First Solution
The first thing I considered was running several different copies of AccountSync and splitting up the work between them via returning only a small subset of customers, retrieved from the database, for each process to work independently.  While an easy and cheap solution to the problem it lacked the ability to run without being re-balanced by hand with each execution.  And frankly the 'solution' was a kludge.

The Second (and final) Solution
After discarding the first idea I started to investigate the parallel programming additions made in .NET 4.5 and was quickly rewarded when I came across the TPL Dataflow.  This isn't something that comes standard with .NET 4.5, rather it is library that you can install via NuGet within Visual Studio.

Coding a multi-threaded application was extremely easy with this library.  I first defined a BufferBlock which contained the list of customers that needed to be processed.
private static readonly BufferBlock CustomerBufferBlock = new BufferBlock();
I then loaded up this buffer with the list of customers. The activeCustomers in this example is a list of integers, customer Id's, that were obtained from database.
activeCustomers.ForEach(customerId => CustomerBufferBlock.SendAsync(customerId));
An ActionBlock needs to be defined.  This is kind of tricky - as within the action block you need to tell it what method and what the parameters to that method are and optionally (I actually recommend it) you should tell the ActionBlock how many threads to execute.  I set up the AccountSync to run the number of threads defined in the configuration file - essentially this could be throttled up or down depending upon need and performance.
// initiation the ActionBlock
ActionBlock customerActions = new ActionBlock(s => ProcessCustomer(s, dao, sleepingHour, restartHour),
        new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = threadCount });
The method, ProcessCustomer needs to be defined in a certain fashion in order for the ActionBlock to get notified of completion events.  You will notice that I set within the parameters when the thread should sleep and when it should wake-up.  This was necessary so that each thread could be forced to sleep for a period of time while the IBMi was placed in batch mode and then wake up when interactive mode was resumed the next morning.
private static async Task ProcessCustomer(long customerId, IPortalDBDao dao, int sleepingHour, int reStartHour)
Finally I needed to link the BufferBlock to the ActionBlock and begin executing the Tasks.
// Link BufferBlock to ActionBlock
CustomerBufferBlock.LinkTo(customerActions);
CustomerBufferBlock.Completion.ContinueWith(task => customerActions.Complete());

// Tell the buffer block i'm done populating it with data - it appears you 
// can continue to load this process as needed as it runs.
CustomerBufferBlock.Complete();

// Now wait until all the items in the buffer block have been execution by the action.
customerActions.Completion.Wait();
Once I made these small changes I ran through the process a few times in single threaded mode to ensure that the defined Task would behave with the parameters. Then I slowly ratcheted up the number of threads until I started to get a diminishing return in performance.  I found that locally running five threads performed about 60% better than running single threaded.

Conclusion
When everything seemed to be working and I found the sweet spot between number of threads and performance I ran this in production.  And it ran...it finished the job in just under 28 hours.  This was done without really pushing the processing power of the systems involved - next time this is executed I'll try adjusting the threads to see if I can get this process to run even faster.  But I am more than pleased with the results the first time around.