Skip to content

SaaS: Tips and Pitfalls of Mass Data uploads

June 15, 2008

Software-as-a-Service (SaaS) solutions are only as effective as the data they contain. Moving legacy data into these systems presents challenges to many enterprises. Moving from a pilot SaaS project to production mode, or increasing the scope of an implementation, requires bulk data uploads. Enterprises must recognize and carefully consider how to support bulk uploads of legacy data into SaaS implementations.

Summary

Software-as-a-Service (SaaS) solutions are only as effective as the data they contain. Moving legacy data into these systems presents challenges to many enterprises.
Moving from a pilot SaaS project to production mode, or increasing the scope of an implementation, requires bulk data uploads. The research note offers the following coverage:
» An introduction to ServiceXen’s SaaS Data Lifecycle.
» Third-party applications to facilitate bulk uploads.
» Triggers for developing custom connectors from vendor APIs.
» Pitfalls for custom connector development.
Preparing for SaaS bulk uploads enables enterprises to support growth and expansion in a controlled manner.

Optimization Point

Small and mid-sized businesses are turning to on-demand applications for functions such as CRM. Using Software-as-a-Service (SaaS) enables these enterprises to avoid many upfront costs and to benefit from rapid implementations. SaaS also has a dark side, notably the loss of control over data. Furthermore, few enterprises consider the implications of data control when evaluating different SaaS offerings.
All SaaS offerings provide templates for uploading production data. These templates are only effective for addressing the requirements of pilot projects. Supporting the rest of the SaaS Data Lifecycle, especially importing production data from legacy systems, requires bulk uploads.
This note addresses the batch transfer requirements for moving SaaS implementations to production levels and for expanding the scale of existing SaaS implementations.
Key Considerations
SaaS implementations for enterprise applications like ERP and CRM depend on data. There is a complete lifecycle for SaaS data. This lifecycle must be considered to ensure both efficient implementation and effective ongoing maintenance of a SaaS project. It has five different steps:
1. Pilot project to ensure that the SaaS project develops the appropriate level of functionality.
2. Batch transfer for production to pre-populate the appropriate data for production.
3. Batch transfer for expansion increases the coverage (adding users, business units, or geographic units) and functionality (adding new fields for service, maintenance, or warranties) of the implementation.
4. Contraction is the effective opposite of batch transfers. Data must be purged when the coverage or functionality of the SaaS implementation is reduced. This data may be deleted, moved to other enterprise applications, or retained as records that address legal or compliance issues.
5. Termination occurs at the end of a SaaS implementation. While the project may end, the data has to continue in other enterprise applications or as retained records.

Improvement & Optimization

To effectively manage large data uploads, SaaS clients have two options: make use of a variety of third-party tools to facilitate the bulk transfer process; or take a custom approach to bulk uploads by utilizing the programming interfaces and APIs offered by large SaaS vendors.
ServiceXen’s first recommendation describes some of the available tools for bulk data uploads. Subsequent recommendations address the creation of custom adapters using vendor-provided APIs.

1. Leverage vendor offerings. There is a wide variety of tools that facilitate the batch upload process.
» Third-party offerings like Pervasive Data Catapult and Scribe Insight support complicated, large, multi-step data transfers. Other offerings specialize in integrating data from specific legacy applications. GoldBox, for example, can access relational data from GoldMine that template-based tools can’t. Scribe’s ActNow provides a similar function for ACT!
» Bulk transfer applications are available from certain vendors. NetSuite gracefully handles large files formatted in smbXML (SMall Business XML). Other vendors can handle data formatted as Microsoft Excel spreadsheets or as a comma-delimited file.
» Community-developed options are becoming increasingly popular. As vendors such as NetSuite and Salesforce.com mature, they develop a large community of independent developers and consultants resulting in many useful utilities. Both the Data Loader and the Excel Connector, for example, emerged out of Salesforce.com’s AppExchange initiative.
2. Know when to create a custom adapter. Some enterprises may elect to write their own connector to facilitate bulk uploads. There are two triggers for this scenario:
» Real-time integration with other systems. Some enterprises may require near real-time integration with other applications such as ERP, marketing automation, or customer service applications.
» Excessive scale and complexity. Third-party applications for bulk uploads will baulk with transfers characterized by:
– Over 200,000 records.
– Over 1,000 data fields.
3. Avoid the development pitfalls. Most large SaaS vendors use the same approach to infrastructure by running the entire application off a single database instance. While the vendors gain advantages through ease of maintenance, this architecture presents challenges to customers that extend APIs to upload data:
» It takes time. Moving batches of hundreds of thousands or millions of records takes a considerable amount of time. The interface must be tweaked to improve performance. Even small improvements in the time to transmit a single record can make dramatic improvements to overall batch transfer times.
» Persistent connections are a must. Negotiating thousands or millions of individual connections for a batch transfer will cripple performance. Use the persistent connection feature of the HTTP/1.1 standard. Note that different development frameworks differ in their support of persistent connections. The .NET framework implements persistence by default while Apache Axis does not

» SSL negotiation is expensive. An additional advantage of persistent connections comes from SSL negotiation. Secure connections are essential for transmitting data to SaaS projects. Unfortunately, SSL connections require considerable overhead and degrade performance.
» Use multiple records per INSERT or UPDATE statement. Most SaaS products enable users to modify or insert multiple records for each transaction. Salesforce.com, for example, enables users to modify up to 200 records for each statement.
» Compress. SaaS APIs generally offer interfaces for files compressed using gzip. If possible, transmit only compressed copies of batched data.
» Multiple threads don’t help. Starting multiple sessions to transmit batched data will not help performance due to the complications of authentication. Multiple threads may even result in the opposite of the intended effect by slowing down overall performance.
» Lock out unnecessary features. SaaS applications often offer features that improve a user’s experience. Salesforce.com, for example, offers a feature called Most Recently Used (MRU) that displays recently accessed data in the user interface. During batch updates the application will constantly update MRUs, leading to poor performance. These types of usability features aren’t necessary during batch transfers. Lock them out.
4. Use consulting help if you need it. Some organizations may lack the skill to develop their own batch transfer tools. Others may find that the quality of their batch processes is not sufficient, resulting in wrong records or the need to rekey data. For these enterprises, SaaS vendors offer professional services that assist with data transfer. These services may be supported by the vendors themselves or by affiliated consultants. There are also a variety of specialist firms available that can perform tasks such as cleaning data and filtering duplicate records.

Bottom Line

Enterprises must carefully consider how to support bulk uploads of legacy data into SaaS implementations. Consider the entire data lifecycle. Recognize that bulk uploads are required for moving large volumes of data and these uploads must be carefully planned and executed.

Advertisements
No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: