Thursday, July 06, 2006

No Pie in the Sky - The Often Overlooked Challenges of BI/GIS Integration

location based services

The integration of Business Intelligence (BI) with GIS has been the subject of much promotion, most of it well deserved.Some of this attention, however, has conveyed an overly optimistic message, and could lead to misconceptions that a transparent integration can be easily obtained. Managers should be familiar with the potential complexities of spatially based technologies before adding a location element to their BI system.Frustration can result from the falsely perceived notion that GIS can be forced into 'mainstream IT' without the proper planning and design efforts, and this is particularly true of BI.Some aspects of spatially based systems retain unique requirements that do not - and should not - fall within the parameters of conventional IT projects. Expectations should be set accordingly.Still, the value of integrating GIS with a BI system should not be undermined.Spatial intelligence is highly underrated in the BI realm, and it offers tremendous growth potential as a valuable tool; but software vendors often purport seamless integration of their products with BI through the use of spatial data types, plug-ins and middleware components (often under the auspices of a 'connector').These offerings may be adequate for simple applications, but complex data models and specialized business logic (as often seen in BI) can marginalize the effectiveness of these tools and create unforeseen integration headaches.In a BI system, data usually resides within a carefully designed schema that promotes reporting and ad-hoc analysis.The level of effort required to provide a spatial aspect to these applications will, of course, be a function of the complexity of the desired analysis.But circumstances are often found within organizations that require greater, often overlooked design requirements before proceeding with true systems integration.Sales demonstrations of BI/GIS integrations will often focus on simplistic examples such as a map-based report containing a choropleth (color-shaded) map depicting the intensity of some variable by a ZIP Code, county or administrative region.I like to think of these as 'rabbit in the hat' examples.Call me David Copperfield, but to the informed eye, it is easy to tell right where the rabbit is coming from.In other words, these 'canned' examples frequently depict BI/GIS integrations in the most basic of terms. Unfortunately, things tend to be a little more complicated in the real world, and the demands of business for spatial analysis usually require more sophistication than floating a few pie charts over a map.This should be of particular interest to IT procurement officers, who are more often than not new to spatial technology.Decision makers such as these should be cautious of claims proposing how easy it is for a spatial component to sit on top of a BI system.This discussion considers three 'red-flags' which are among the many potential BI/GIS integration challenges that can merit additional planning, and increase the need for customized solutions.Each of these involves complexities introduced through the use of 'areal units.' While sounding like another example of GIS-geek speak, the concept of areal units is simple and pervades almost all GIS applications.In this context, areal units are simply how an organization chooses to geographically define and administer its business.Examples include standard boundaries such as ZIP Codes, counties and block groups, or custom boundaries such as trade areas, police beats and sales territories.If your organization 'thinks' in terms of areal units, more careful consideration needs to be devoted to the data-flows between the BI and the GIS before embarking on an integration.Lets consider three common situations, which can lead to complexity:
Your BI system is based on data that represent point locations, but you want to visualize your business data by areal units (e.g.store visits by trade are, crime locations by beat or prospect addresses by sales territory).
Your areal units change.Do your administrative regions change? What about when ZIP or area codes are added or otherwise modified?
A need exists to maintain a historic view of your areal units. Your BI system is based on data that represents point locationsWhat about the common situation where a BI system presents the user with dozens of variables to drill-down on? After the user has 'sliced and diced' the data to his or her heart's content, the next step might be to see this information mapped.Frequently, this selected data from the BI system translates to point locations.A customer's home, service calls, traffic accidents, disease reports or new home construction are all examples of point data.For purposes of visualization, the user often wants to see those frequency of occurrence displayed as a shaded 'rate' map based on some areal unit important to the organization.For instance, a police analyst may have carefully selected certain felonies committed within a specific time period, and would now like to see those mapped by beat.Or, a marketing analyst may be interested in where a specific profile of customer is buying the newest product line. Both of these analyses would rely on querying the BI system, but how do these regions (in this example police beats and trade areas) 'know' about the density of these data subsets within their boundaries? Because a BI system enables infinite possibilities for sub-setting data, this becomes a distinct challenge.There are several possible solutions for this, but each comes with its own limitations.For instance, one solution is to maintain identifiers that link point-based data with areal data, but you must then join the data 'on-the-fly' to the actual geographic boundaries, and have a process in place for maintaining the identifiers on the point data.This is effective at coarse geographies, but performance could be impaired for fine-level analysis.You would also need to ensure your transactional system is properly populating the point identifiers with the respective areal unit to which it belongs.Alternatively, you could simply pre-process the areal units with the BI data, but this becomes challenging when multiple data elements are involved with hundreds or even thousands of data possibilities Your areal units changeMore often than not, boundaries are dynamic.Sales territories are reallocated, congressional districts redrawn, and new ZIP Codes are added.When these changes occur, lapses in data integrity can accompany them, and specious compromises in business logic are possible.Take customer Johnson, who typically loads up on $700 in widgets per month at store 1,000.Last month trade area 1,000 was modified, and Johnson now resides within the trade area of newly opened store 1,001.Johnson could of course remain loyal to store 1,000, but if we want to analyze sales by trade area we might need to distinguish between sales potential (which trade area Johnson resides within), and actual sales (where Johnson shops).Or maybe we are tracking text messages sent by cellular coverage.New cell towers are added and tuned, and these coverage boundaries change.Again, there is potential for flaws depending on how the data is maintained and linked between the BI and GIS.More complexity is introduced when demographic data (population, income) is updated.As delivered by vendors or public sources, this data is almost always packaged within federal census boundaries. Because your areal units may not be coincident with these boundaries (e.g.census block-groups), additional processes might need to be built to reflect the new data.And is there a need to backfill your old data when new ZIP Codes are added or modified? Questions like these are typically overlooked by those accustomed to BI, but new to GIS.Again, tabular approaches (such as employing database views to link spatial data with demographic data) to this problem are a possibility, but when actual boundaries are dynamic the issue becomes more complicated.A need exists to maintain a historic view of your areal unitsHistory presents a complicating dimension to any GIS effort, and previously unforeseen issues can arise when historic data needs to be mapped.For instance, take disease and crime tracking systems.Here, you might want to see both individuals and incidents mapped.While the incident location for a particular disease or crime might be important for assessing causes using environmental factors, the current location of the infected or convicted individual (who could have moved, or been incarcerated!) might be important for other analyses.This is an example of when different location 'types' whose importance, or even existence, may not become apparent until you begin location-enabling your BI system.This distinction must also be addressed with the subsequent linking of this information to areal units.There are other concerns.Is the system set up to allow for multiple children 'events' to be tied to 'parent' locations? This also has the potential for historic analysis.To illustrate, think of food-poisoning reports.You might want to link these to a particular 'suspect' fast-food chain, but you would also want to be able have tight control of 'when' these reports are being mapped to determine responsibility.While by no means 'show-stoppers,' these are examples of where a spatial component can place unique requirements to a BI system.Proper data design and administration can address these issues, but it is easy to overlook situations like these when the nuances of BI/GIS integration are not planned.Hopefully, it has become evident that GIS/BI integrations have the potential to be more complicated than at first glance.These, and other challenges not discussed, are exacerbated by the inevitable communication gaps between BI and GIS technicians.The lingua franca of GIS is something that can take a deep understanding of spatial technology, and years of experience.Many of the issues encountered while 'spatializing' a BI system cannot be addressed by the normal tabular-based tricks your garden-variety database administrator wields. Even grasping the concepts for those new to GIS can be difficult, so a successful BI/GIS integration will involve an expert in spatial technology throughout the process.All of this should serve as a caveat emptor to decision makers.Be wary of pie-in-the-sky sales pitches that claim BI systems can be spatialized with some middleware and a sprinkling of fairy dust.Canned solutions may work for basic applications, but youll want to fully understand how your data and business processes fit into a GIS context before proceeding with a major integration effort.This is paramount to developing a sound model for your initial requirements.You will also want to consider how your design will scale and be able to adjust to the demand for broadened functionality that inevitably occurs as organizations become comfortable with spatial technology and the value it offers.Ensuring your GIS staff has a grasp of the BI systems data model, and the stewards of the BI system appreciate the potential complexities of GIS is a good first step to a successful effort.Doug Kolom is manager of GeoDecisions' Chicago Office, and has overseen several BI / GIS integration efforts in the commercial and public.

No comments: