[ultimatesocial count="true" networks="linkedin,facebook,twitter" url="https://www.lyonscg.com/2013/05/24/demandware-best-practices-data-migration-and-replication/" skin="minimal"]

Demandware Best Practices: Data Migration and Replication

Steve Susina • May 24, 2013

This is the first entry in a series of blog posts dealing with Demandware Best Practices. Demandware details a series of these practices for handling most tasks within a client’s Organization. We will take these documents and examine them while detailing how lyonscg approaches these practices within our own practices.

First, the lifeblood of any organization is the data. We will be discussing five of the points for data replication discussed here, expanding upon the reasoning behind them and discussing how lyonscg implements these practices.

Data Replication Timing

1. Replication should be run during low traffic hours since this is a database intensive process. We recommend running this during late night, early morning hours.

Our first task is to establish the organization’s time zone. By default, a Demandware instance is set to ETC/UTC. This is the equivalent to GMT. By setting the instance time zone appropriately for the client (via Administration -> Global Preferences -> Instance Time Zone), the question “When is your low traffic time?” is easily answerable. This will enable the job scheduler to correctly time the jobs. It is of note that if the job schedule is set in one time zone, and the time zone is then changed, the time of the job running is automatically adjusted for the new time. If a job is set for 1 AM on UTC, then the time zone is changed to GMT-8 (Pacific time zone US), then the job time will be changed to 5PM.

Also, it must be noted when ingested feeds are populated. If a job is scheduled for 1 AM, but the client’s FTP server is also scheduled to receive the data at that time, the job may not pick up any files until the next day. At lyonscg, we are sure to verify when files are sent to the server, the time zone that is being referenced for that time, and that the job run time is scheduled after the feed has had ample time to upload.

Data Replication should be scheduled after such jobs that handle data being replicated have been run at the schedule confirmed by the client.

Editing Data

2. No manual edits, updates or changes should be made in business manager on target or source system when replication process is running. Also make sure that there are no running jobs on the target system which change the same data as being replicated.

It should seem a natural byproduct of running during non-peak hours, but jobs may be run manually at any point during the day. We’ve at a minimum, in such an event before running a job, a communication list should be created to verify that no users are currently editing, and that no edits should be attempted until a follow-up email is received. The user performing the replication will be notified via email once the replication has finished. That user should then notify the group that any editing may resume.

Instance Flow Logic

4. All data must be staged to development first. Then data should be verified. For example if one replicates a search index to production, verify that search is working after replication. If replicating content or products make sure they are working as designed after replication.

This is a situation that is most difficult for traditional developers to understand. In typical software engineering workflow, work is done on a “QA” instance and tested. It is then sent to a Staging instance where it is verified. Finally, everything is pushed from Staging to Production. In Demandware, Staging is the “source of truth.” Everything flows from Staging. Data and Code can be replicated from Staging, but never to Staging. Instead of an “A to B to C” pattern, it is more “A to B, then A to C.”

Work (content creation, content slot scheduling, campaign / promotion / coupon) is done on Staging, then sent to the “QA” (Development) instance for verification. Once verified on Staging, a second replication from Staging to Production is done. Sometimes a flaw is found on the Development instance. It is then corrected and a new replication to Development is performed.

We typically refer to this as “Demandware World,” because it is very different from traditional practice. However, the difference is mainly nominal, as the “Development” instance takes the place of a traditional “Staging” instance. Staging behaves as the traditional “QA” instance, even though QA would take place on Development. These three instances – Staging, Development, Production – comprise what is termed the PIG – Primary Instance Group. The main drawback to this approach is that if a replication is deemed unfit for Production, then the next replication would likely include new Content that was not present in the previous replication. Also, code development is always ongoing. A later replication may include code committed since the previous replication.

At lyonscg, we are attempting to address the issues with code replication by building code directly to the Development instance. This allows us to replicate code independently to both Staging and Development instances. By doing this, we can rollback any code that fails QA in a separate build to Staging. Doing traditional Demandware code replication from Staging to Development is an all-or-nothing approach. If a code replication fails QA on Development, a new code build must be done to Staging with the failed changes removed. lyonscgs’ adjustment to this practice allows approved code from QA to be built to Staging, then replicated to Production.

Undoing a replication

7. There is an undo feature that can roll back the last staging that was run. Note this can only undo the last process, which is another reason to verify data. If your replicated data is not working as designed you can revert the application back to the state it was in before replication was run by using this process.

In our experience, this can result in temporary static content issues – specifically with cached images. Given that the Akamai CDN cache takes up to 15 minutes to refresh, the site can appear to have image issues for that entire time after a rollback.

This is fairly straightforward, but it should be noted, that unlike code versions, data replications are not maintained. If a flaw is found in replicated Content, all other pieces of the replication should be verified before performing a follow-up replication.

Replicating Multiple Objects

10. If replicating multiple objects within a task and the process fails, one should try to create a process with only one object and see if it completes. The staging logs may also shed light on the group that caused the failure.

It is good practice to schedule Content-only replications apart from, say, Catalog replications. In previous projects, Content has been replicated automatically throughout points of the day – in the morning, in the afternoon, and finally at close of business. By limiting the objects replicated in any given job, points of failure can be quickly identified. Also, there is no need to schedule / copy full data replications if a catalog import job is run once per day. Ideally, an organization would schedule a catalog replication during late night / early morning hours, depending on the frequency of the catalog import job.

Conclusion

In all, the referenced Demandware document lists thirteen items in their best practice document for data replication. We have examined five of these items and how Lyons Consulting Group applies them in our Demandware implementations. By following these guidelines, clients can avoid outages related to failed / corrupt data migrations and avoid the financial losses that accompany such failures.

click here to learn more about the lyonscg+demandware solution.

About the author

Steve Susina