Reprocess Multiple Days of Failed Executions Easily with a Retry Schedule

Reprocess Multiple Days of Failed Executions Easily with a Retry Schedule

The retry works by re-using the documents that were picked up by the start shape. If your process does not have a connector start shape, or it is not returning the relevant data to be processed in the start shape, then it is unlikely that this will accomplish what you need and you should come up with a different method for reprocessing failed documents.

This article is based on a real production situation. I took advantage of the situation and documented the process for your benefit.

Before I enabled the retry schedule, I analyzed my process to be sure it would work. My process actually starts with a connector so I’m all good there. But my process uses a persisted Dynamic Process Property to determine the starting point for the next query. However, as an added bonus, this process is made completely retry friendly by only updating this value IF it is not a retry execution. I could still do this without this special handling in the process, especially for a one time fix, but I would need to note the DPP stored value and repopulate it before I start up the regular processing again. Because I have this handling, I will not need to do that. I will still note the DPP starting value though just in case (You should stop the schedules on the process and make sure it isn’t running before capturing this value).

I will stop the standard schedule and start the retry schedule for this one time reprocessing of multiple days worth of failures.

I will make a note of the existing standard execution schedule and delete it. To do this you must delete it, since you cannot pause both schedules independently. I don’t think this is necessary, especially with how my process is set up. This is just an added precaution.
At the same time, I will add my retry schedule. I set it at 5 minutes, but I don’t think this value matters very much. I think it will actually retry all the un-retried failures in a very programmatic way, starting at your first scheduled trigger. Because of this, I think you may only need a single trigger for this.
I set my Maximum Number of Retries to ‘1’. I’m assuming ‘0’ was the correct choice and would mean that it would try once, but I wasn’t sure so I chose ‘1’ to be safe. My errors were due to a temporary issue with the destination endpoint, so trying more than once is unnecessary.

After I ‘OK’ my schedule changes, the result is always impressive. I would like to draw your attention to a few things.

Based on the appearance, I believe it is adding them all to a queue and then processing them sequentially. The starting times will all be very close together and then the processing time will show as very long for the last ones processed and very short for the first ones. In my example, I had 188 executions that were retried and they all showed start times within 1 minute of each other, but they did not all finish until around 1 hr 20 minutes
It only reprocesses the failed start shape documents. It will not reprocess the successful ones.
It’s very automatic. With just a few steps I’m reprocessing ALL the un-retried failures for as far back as it has data. I believe this is dependent on your Account Property: Purge History After X Days value (This value cannot necessarily be changed to any effect in a public atom cloud). In my example it was set to 14 days, but most failures were within the last 2 days.

More about my specific example

My example was 188 executions containing 266 failed orders. It seemed like a database issue and it only failed certain documents out of batches, so almost every execution had some successful and some failed documents. I was able to reprocess all of these with about 5 minutes of setup and 1 hour and 20 minutes of processing time. There is literally no other solution I could have used that would only take 5 minutes of my time.

Think about how you want to handle reprocessing failures

My process was actually specifically designed to be able to utilize retries and utilize them very gracefully. This article is intended to do two things.

Show you how the retry schedule actually works
Give you a reason to consider building processes with this in mind

There are many reasons why some use cases can’t or shouldn’t be built this way. Please do not take this as the only way you should consider designing a process.

Use this for connectivity issues or hotfixes

In my example it was a connectivity issue that caused the errors, so nothing in the process had to change, but this is completely viable for correcting a defect in the process as well. As long as your fix does not introduce data into the start shape document that wasn’t there before, all you have to do is build and test your fix. Then after you deploy your fix to production you can use these same steps to reprocess all the failures under the newly deployed code.

Categories

Recent Posts

Recent Comments

Reprocess Multiple Days of Failed Executions Easily with a Retry Schedule

Leave a Reply Cancel reply